Prototype Selection Methods

Olvera López, José Arturo; Carrasco Ochoa, Jesús Ariel; Martínez Trinidad, José Francisco

Services on Demand

Journal

Article

Indicators

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.13 n.4 Ciudad de México Apr./Jun. 2010

Resumen de tesis doctoral

Prototype Selection Methods

Métodos para la selección de prototipos

Graduated: José Arturo Olvera López
National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Santa María Tonantzintla, C.P. 72840, Puebla, México. aolvera@inaoep.mx

Advisor: Jesús Ariel Carrasco Ochoa
National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Santa María Tonantzintla, C.P. 72840, Puebla, México. ariel@inaoep.mx

Advisor: José Francisco Martínez Trinidad
National Institute of Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Santa María Tonantzintla, C.P. 72840, Puebla, México. fmartine@inaoep.mx

Graduated on April 16th, 2009

Abstract

In pattern recognition, supervised classifiers assign a class to unseen objects or prototypes. For classifying new prototypes a training set is used which provides information to the classifiers during the training stage. In practice, not all information in a training set is useful therefore it is possible to discard some irrelevant prototypes. This process is known as prototype selection and it is the main topic of this thesis. Through prototype selection the training set size is reduced which allows reducing the runtimes in the classification and/or training stages of classifiers.

Several methods have been proposed for selecting prototypes however their performance is strongly related to the use of a specific classifier and most of the methods spend long time for selecting prototypes when large datasets are processed.

In this thesis, four methods for selecting prototypes, which solve drawbacks of some methods in the state of the art are proposed. The first two methods are based on the sequential floating search and the two remaining methods are based on clustering and prototype relevance respectively.

Keywords: Prototype selection, Data Reduction, Sequential Selection, Border Prototypes.

Resumen

En reconocimiento de patrones, los clasificadores supervisados asignan una clase a nuevos objetos o prototipos. Para clasificar prototipos se usa un conjunto de entrenamiento el cual proporciona información a los clasificadores durante la etapa de entrenamiento. En la práctica, no toda la información en los conjuntos de entrenamiento es útil, por lo que se pueden descartar prototipos irrelevantes. A este proceso se le denomina selección de prototipos, el cual es el tema central de esta tesis.

Mediante la selección de prototipos se reduce el tamaño de los conjuntos de entrenamiento, lo cual permite una reducción en los tiempos de ejecución en las fases de clasificación o entrenamiento de los clasificadores. Se han propuesto diversos métodos para la selección de prototipos cuyo desempeño depende del uso de un clasificador particular, por otra parte, la mayoría de los métodos para la selección de prototipos son costosos, principalmente cuando se procesan grandes conjuntos de datos.

En esta tesis se presentan cuatro métodos para la selección de prototipos; dos de ellos se basan en la búsqueda secuencial flotante y los dos restantes en agrupamientos y relevancia de prototipos respectivamente.

Palabras clave: Selección de Prototipos, Reducción de Datos, Selección Secuencial, Prototipos Frontera.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

1. Aha, D. W. (1991). Instance based learning algorithms. Machine Learning, 6 (1), 37–66. [ Links ]

2. Angiulli, F. (2007). Condensed Nearest Neighbor Data Domain Description. IEEE Transaction on Pattern Analysis and Machine Intelligence, 29 (10), 1746–1758. [ Links ]

3. Asunción, A., & Newman, D. J. (2007). UCI Machine Learning Repository. Irvine, CA. University of California, School of Information and Computer Science. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. [ Links ]

4. Bezdek, J. C., & Kuncheva, L. I. (2001). Nearest Prototype Classifier Designs: An Experimental Study. International Journal of Intelligent Systems, 16 (12), 1445–1473. [ Links ]

5. Brighton, H., & Mellish, C. (2002). Advances in Instance–Based Learning Algorithms. Data Mining and Knowledge Discovery, 6, 153–172. [ Links ]

6. Chien–Hsing, C., Bo–Han, K., & Fu, C. (2006). The Generalized Condensed Nearest Neighbor Rule as a Data Reduction Method. 18^th International Conference on Pattern Recognition, Hong Kong, 2, 556–559. [ Links ]

7. Devijver, P. A., & Kittler, J. (1980). On the edited nearest neighbor rule. 5^th International Conference on Pattern Recognition, Miami, Florida, USA, 72–80. [ Links ]

8. Eick, C. F., Zeidat, N., & Vilalta, R. (2004). Using Representative–Based Clustering for Nearest Neighbor Dataset Editing. 4^th IEEE International Conference on Data Mining, Brighton, UK, 375–378. [ Links ]

9. Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory, 14, 515–516. [ Links ]

10. Kuncheva, L. I., & Bezdek, J. C. (1998). Nearest prototype classification: clustering, genetic algorithms or random search?. IEEE Transactions on Systems, Man and Cybernetics, C28(1), 160 – 164. [ Links ]

11. Li, M. & Zhi–Hua, Z. (2005). SETRED: Self–training with Editing. Pacific–Asia Conference on Knowledge Discovery and Data Mining PAKDD 2005. Lecture Notes in Artificial Intelligence, 3518, 611 –621. [ Links ]

12. Liu, H., & Motoda, H. (2002). On Issues of Instance Selection. Data Mining and Knowledge Discovery, 6, 115–130. [ Links ]

13. Lumini, A., & Nanni, L. (2006). A clustering method for automatic biometric template selection. Pattern Recognition, 39, 495–497. [ Links ]

14. Narayan, B. L., Murthy, C. A. & Pal, S. K (2006). Maxdiff kd–trees for data condensation. Pattern Recognition Letters, 27, 187–200. [ Links ]

15. Olvera–López, J. A., Carrasco–Ochoa, J. A. & Martínez–Trinidad, J. F. (2005). Sequential Search for Decremental Edition. Intelligent Data Engineering and Automated Learning IDEAL 2005. Lecture Notes in Computer Science, 3578, 280–285. [ Links ]

16. Pudil, P., Ferri, F.J., Novovicová, J. & Kittler, J. (1994). Floating Search Methods for Feature Selection with Nonmonotonic Criterion Functions. 12^th International Conference on Pattern Recognition, Jerusalem, Israel, 279–283. [ Links ]

17. Raicharoen, T. & Lursinsap, C. (2005). A divide–and–conquer approach to the pairwise opposite class–nearest neighbor (POC–NN) algorithm. Pattern Recognition Letters, 26(10), 1554–1567. [ Links ]

18. Ritter, G. L., Woodruff, H.B., Lowry, S. R. & Isenhour, T. L. (1975). An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory, 21(6), 665–669. [ Links ]

19. Spillmann, B., Neuhaus, M., Bunke, H., Pekalska, E. & Duin, R. P. W. (2006). Transforming Strings to Vector Spaces Using Prototype Selection. International workshop on Structural and Syntactic Pattern Recognition SSPR&SPR 2006. Lecture Notes in Computer Science, 4109, 287–296. [ Links ]

20. Srisawat, A., Phienthrakul, T., & Kijsirikul, B. (2006). SV–kNNC: An Algorithm for Improving the Efficency of k–Nearest Neighbor. Pacific Rim International Conference on Artificial Intelligence PRICAI 2006. Lecture Notes in Artificial Intelligence, 4099, 975–979. [ Links ]

21. Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2(3), 408–421. [ Links ]

22. Wilson, D. R., & Martínez, T. R. (2000). Reduction Techniques for Instance–Based Learning Algorithms. Machine Learning, 38, 257–286. [ Links ]

23. Zhang, H., & Sun, G. (2002). Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognition, 35, 1481–1490. [ Links ]