Services on Demand
Journal
Article
Indicators
- Cited by SciELO
- Access statistics
Related links
- Similars in SciELO
Share
Computación y Sistemas
On-line version ISSN 2007-9737Print version ISSN 1405-5546
Comp. y Sist. vol.13 n.4 Ciudad de México Apr./Jun. 2010
Resumen de tesis doctoral
Prototype Selection Methods
Métodos para la selección de prototipos
Graduated: José Arturo Olvera López
National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Santa María Tonantzintla, C.P. 72840, Puebla, México. aolvera@inaoep.mx
Advisor: Jesús Ariel Carrasco Ochoa
National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Santa María Tonantzintla, C.P. 72840, Puebla, México. ariel@inaoep.mx
Advisor: José Francisco Martínez Trinidad
National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Santa María Tonantzintla, C.P. 72840, Puebla, México. fmartine@inaoep.mx
Graduated on April 16th, 2009
Abstract
In pattern recognition, supervised classifiers assign a class to unseen objects or prototypes. For classifying new prototypes a training set is used which provides information to the classifiers during the training stage. In practice, not all information in a training set is useful therefore it is possible to discard some irrelevant prototypes. This process is known as prototype selection and it is the main topic of this thesis. Through prototype selection the training set size is reduced which allows reducing the runtimes in the classification and/or training stages of classifiers.
Several methods have been proposed for selecting prototypes however their performance is strongly related to the use of a specific classifier and most of the methods spend long time for selecting prototypes when large datasets are processed.
In this thesis, four methods for selecting prototypes, which solve drawbacks of some methods in the state of the art are proposed. The first two methods are based on the sequential floating search and the two remaining methods are based on clustering and prototype relevance respectively.
Keywords: Prototype selection, Data Reduction, Sequential Selection, Border Prototypes.
Resumen
En reconocimiento de patrones, los clasificadores supervisados asignan una clase a nuevos objetos o prototipos. Para clasificar prototipos se usa un conjunto de entrenamiento el cual proporciona información a los clasificadores durante la etapa de entrenamiento. En la práctica, no toda la información en los conjuntos de entrenamiento es útil, por lo que se pueden descartar prototipos irrelevantes. A este proceso se le denomina selección de prototipos, el cual es el tema central de esta tesis.
Mediante la selección de prototipos se reduce el tamaño de los conjuntos de entrenamiento, lo cual permite una reducción en los tiempos de ejecución en las fases de clasificación o entrenamiento de los clasificadores. Se han propuesto diversos métodos para la selección de prototipos cuyo desempeño depende del uso de un clasificador particular, por otra parte, la mayoría de los métodos para la selección de prototipos son costosos, principalmente cuando se procesan grandes conjuntos de datos.
En esta tesis se presentan cuatro métodos para la selección de prototipos; dos de ellos se basan en la búsqueda secuencial flotante y los dos restantes en agrupamientos y relevancia de prototipos respectivamente.
Palabras clave: Selección de Prototipos, Reducción de Datos, Selección Secuencial, Prototipos Frontera.
DESCARGAR ARTÍCULO EN FORMATO PDF
References
1. Aha, D. W. (1991). Instance based learning algorithms. Machine Learning, 6 (1), 3766. [ Links ]
2. Angiulli, F. (2007). Condensed Nearest Neighbor Data Domain Description. IEEE Transaction on Pattern Analysis and Machine Intelligence, 29 (10), 17461758. [ Links ]
3. Asunción, A., & Newman, D. J. (2007). UCI Machine Learning Repository. Irvine, CA. University of California, School of Information and Computer Science. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. [ Links ]
4. Bezdek, J. C., & Kuncheva, L. I. (2001). Nearest Prototype Classifier Designs: An Experimental Study. International Journal of Intelligent Systems, 16 (12), 14451473. [ Links ]
5. Brighton, H., & Mellish, C. (2002). Advances in InstanceBased Learning Algorithms. Data Mining and Knowledge Discovery, 6, 153172. [ Links ]
6. ChienHsing, C., BoHan, K., & Fu, C. (2006). The Generalized Condensed Nearest Neighbor Rule as a Data Reduction Method. 18th International Conference on Pattern Recognition, Hong Kong, 2, 556559. [ Links ]
7. Devijver, P. A., & Kittler, J. (1980). On the edited nearest neighbor rule. 5th International Conference on Pattern Recognition, Miami, Florida, USA, 7280. [ Links ]
8. Eick, C. F., Zeidat, N., & Vilalta, R. (2004). Using RepresentativeBased Clustering for Nearest Neighbor Dataset Editing. 4th IEEE International Conference on Data Mining, Brighton, UK, 375378. [ Links ]
9. Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory, 14, 515516. [ Links ]
10. Kuncheva, L. I., & Bezdek, J. C. (1998). Nearest prototype classification: clustering, genetic algorithms or random search?. IEEE Transactions on Systems, Man and Cybernetics, C28(1), 160 164. [ Links ]
11. Li, M. & ZhiHua, Z. (2005). SETRED: Selftraining with Editing. PacificAsia Conference on Knowledge Discovery and Data Mining PAKDD 2005. Lecture Notes in Artificial Intelligence, 3518, 611 621. [ Links ]
12. Liu, H., & Motoda, H. (2002). On Issues of Instance Selection. Data Mining and Knowledge Discovery, 6, 115130. [ Links ]
13. Lumini, A., & Nanni, L. (2006). A clustering method for automatic biometric template selection. Pattern Recognition, 39, 495497. [ Links ]
14. Narayan, B. L., Murthy, C. A. & Pal, S. K (2006). Maxdiff kdtrees for data condensation. Pattern Recognition Letters, 27, 187200. [ Links ]
15. OlveraLópez, J. A., CarrascoOchoa, J. A. & MartínezTrinidad, J. F. (2005). Sequential Search for Decremental Edition. Intelligent Data Engineering and Automated Learning IDEAL 2005. Lecture Notes in Computer Science, 3578, 280285. [ Links ]
16. Pudil, P., Ferri, F.J., Novovicová, J. & Kittler, J. (1994). Floating Search Methods for Feature Selection with Nonmonotonic Criterion Functions. 12th International Conference on Pattern Recognition, Jerusalem, Israel, 279283. [ Links ]
17. Raicharoen, T. & Lursinsap, C. (2005). A divideandconquer approach to the pairwise opposite classnearest neighbor (POCNN) algorithm. Pattern Recognition Letters, 26(10), 15541567. [ Links ]
18. Ritter, G. L., Woodruff, H.B., Lowry, S. R. & Isenhour, T. L. (1975). An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory, 21(6), 665669. [ Links ]
19. Spillmann, B., Neuhaus, M., Bunke, H., Pekalska, E. & Duin, R. P. W. (2006). Transforming Strings to Vector Spaces Using Prototype Selection. International workshop on Structural and Syntactic Pattern Recognition SSPR&SPR 2006. Lecture Notes in Computer Science, 4109, 287296. [ Links ]
20. Srisawat, A., Phienthrakul, T., & Kijsirikul, B. (2006). SVkNNC: An Algorithm for Improving the Efficency of kNearest Neighbor. Pacific Rim International Conference on Artificial Intelligence PRICAI 2006. Lecture Notes in Artificial Intelligence, 4099, 975979. [ Links ]
21. Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2(3), 408421. [ Links ]
22. Wilson, D. R., & Martínez, T. R. (2000). Reduction Techniques for InstanceBased Learning Algorithms. Machine Learning, 38, 257286. [ Links ]
23. Zhang, H., & Sun, G. (2002). Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognition, 35, 14811490. [ Links ]