SciELO - Scientific Electronic Library Online

 
vol.18 issue2Feature Selection for Microarray Gene Expression Data Using Simulated Annealing Guided by the Multivariate Joint EntropyEfficiently Finding the Optimum Number of Clusters in a Dataset with a New Hybrid Cellular Evolutionary Algorithm author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.2 México Apr./Jun. 2014

http://dx.doi.org/10.13053/CyS-18-2-2014-033 

Artículos regulares

 

Attribute and Case Selection for NN Classifier through Rough Sets and Naturally Inspired Algorithms

 

Selección de atributos y casos para el clasificador NN a través de conjuntos aproximados y algoritmos inspirados en la naturaleza

 

Yenny Villuendas-Rey1 and Maria Matilde Garcia-Lorenzo2

 

1 Department of Computer Science, University of Ciego de Avila, Cuba. yenny@informatica.unica.cu

2 Department of Computer Science, University Marta Abreu of Las Villas, Cuba. mmgarcia@uclv.edu.cu

 

Abstract

Supervised classification is one of the most active research fields in the Artificial Intelligence community. Nearest Neighbor (NN) is one of the simplest and most consistently accurate approaches to supervised classification. The training set preprocessing is essential for obtaining high quality classification results. This paper introduces an attribute and case selection algorithm using a hybrid Rough Set Theory and naturally inspired approach to improve the NN performance. The proposed algorithm deals with mixed and incomplete, as well as imbalanced datasets. Its performance was tested over repository databases, showing high classification accuracy while keeping few cases and attributes.

Keywords: Nearest neighbor, case selection, attribute selection.

 

Resumen

La clasificación supervisada constituye una de las áreas de investigación más activas dentro de la Inteligencia Artificial. La regla del vecino más cercano (NN) es una de las más simples y efectivas para la clasificación supervisada. El pre-procesamiento del conjunto de entrenamiento es esencial para obtener clasificaciones de alta calidad. En este artículo se introduce un nuevo algoritmo de selección de atributos y casos que utiliza un enfoque híbrido basado en los Conjuntos Aproximados y los algoritmos inspirados en la naturaleza para mejorar el desempeño de clasificadores NN. El algoritmo propuesto permite el manejo de conjuntos de datos mezclados, incompletos, y no balanceados. El desempeño de dicho algoritmo se analizó utilizando bases de datos de repositorio, mostrando una alta eficacia del clasificador, utilizando solamente pocos casos y atributos.

Palabras clave: Vecino más cercano, selección de casos, selección de atributos.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

1. Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 13(1), 21-27.         [ Links ]

2. Wilson, D.R. & Martinez, T.R. (1997). Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research, 6(1), 1 -34.         [ Links ]

3. Zhang, B. & Shrihari, S.N. (2004). Fast k-nearest neighbors Classification using cluster-based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(4), 525-528.         [ Links ]

4. Liu, H., Motoda, H., Setiono, R., & Zhao, Z. (2010). Feature selection: an ever evolving frontier in data mining. JMLR Workshop and Conference Proceedings, Hyderabad, India, 10, pp. 4-13.         [ Links ]

5. García, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435.         [ Links ]

6. Kuncheva, L.I. & Jain, L.C. (1999). Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recognition Letters, 20(11-13), 1149-1156.         [ Links ]

7. Derrac, J., Cornelis, C., García, S., & Herrera, F. (2012). Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Information Sciences, 186(1), 73-92.         [ Links ]

8. Caballero, Y., Bello, R., Salgado, Y., & García, M.M. (2007). A method to edit training sets based on rough sets. International Journal of Computational Intelligence Research, 3(3), 219229.         [ Links ]

9. Fan, H. & Zhong, Y. (2012). A rough set approach to feature selection based on wasp swarm optimization. Journal of Computational Information Systems, 8(3), 1037-1045.         [ Links ]

10. Pawlak, Z. (1982). Rough sets. International Journal of computer and Information Sciences, 11(5), 341 -356.         [ Links ]

11. Hu, Q., Yu, D., Liu, J., & Wu, C. (2008). Neighborhood rough set based heterogeneous feature selection. Information Sciences, 178(18), 3577-3594.         [ Links ]

12. Filiberto, Y., Caballero, Y., Larrua, R., & Bello, R. (2010). A method to build similarity relations into extended rough set theory. 10th International conference on Intelligent Systems Design and Applications (ISDA 2010), Cairo, Egypt, 13141319.         [ Links ]

13. Villuendas-Rey, Y., Caballero-Mota, Y., & García-Lorenzo, M.M. (2012). Using Rough Sets and Maximum Similarity Graphs for Nearest Prototype Classification. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Lecture Notes in Computer Science, 7441, 300-307.         [ Links ]

14. Jensen, R. & Cornelis, C. (2010). Fuzzy-rough instance selection. IEEE World Congress on Computational Intelligence (WCCI 2010), Barcelona, Spain, 1776-1782.         [ Links ]

15. Bell, D.A. & Guan, J.W. (1998). Computational methods for rough classification and discovery. Journal of the American Society for Information Science - Special issue: knowledge discovery and data mining, 49(5), 403-414.         [ Links ]

16. Santiesteban, Y. & Pons-Porrata, A. (2003). LEX: A new algorithm to calculate typical testors. Mathematics Sciences Journal, 21, 31-40.         [ Links ]

17. Suguna, N. & Thanushkodi, K. (2010). A novel rough set reduct algorithms for medical domains based on bee colony optimization. Journal of Computing, 2(6), 49-54.         [ Links ]

18. Banati, H. & Bajaj, M. (2011). Fire fly based feature selection approach. International Journal of Computer Science Issues, 8(4), 473-480.         [ Links ]

19. Camps-Echevarría, L., Llanes-Santiago, O., Silva-Neto, A.J., & Fraga de Campos-Velho, H. (2014). An Approach to Fault Diagnosis Using Meta-Heuristics: a New Variant of the Differential Evolution Algorithm. Computación y Sistemas, 8(4), 473-480.         [ Links ]

20. Bolufé-Röhler, A., Otero, J.M., & Fiol-González, S. (2014). Traffic Flow Estimation Using Ant Colony Optimization Algorithms. Computación y Sistemas, 18(1), 37-50.         [ Links ]

21. Kuncheva, L.I. (1995). Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognition Letters, 16(1995), 809-814.         [ Links ]

22. Kuncheva, L.I. & Bezdek, C.J. (1998). Nearest prototype classification: clustering, genetic algorithms or random search. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 28(1), 160-164.         [ Links ]

23. Gagne, C. & Parizeau, M. (2007). Coevolution of nearest neighbor classifiers. International Journal of Pattern Recognition and Artificial Intelligence, 21(5), 921 -946.         [ Links ]

24. Siedlecki, W. & Skalansky, J. (1989). A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters, 10(5), 335-347.         [ Links ]

25. Vafaie, H. & De Jong, K. (1992). Genetic algorithms as a tool for feature selection in machine learning. Fourth International Conference on Tools with Artificial Intelligence, Arlington, VA, 200-203.         [ Links ]

26. Yang, J. & Honavar, V. (1998). Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications, 13(2), 44-49.         [ Links ]

27. Olveira, L.S., Benahmed, N., Sabourin, R., & Bortolozzi, F. (2001). Feature subset selection using genetic algorithms for handwritten digit recognition. 14th Brazilian symposium on Computer Graphics and Image Processing, Florianópolis, Brazil, 362-369.         [ Links ]

28. Sahan, S., Polat, k., Kodaz, H., & Gunes, S. (2007). A new hybrid method based on fuzzy artificial immune system and k-nn algorithm for breast cancer diagnosis. Computers in Biology and Medicine, 37(3), 415-423.         [ Links ]

29. Garain, U. (2008). Prototype reduction using an artificial immune model. Pattern analysis & Applications, 11(3-4), 353-363.         [ Links ]

30. Ahmad, S.S.S. & Pedrycz, W. (2011). Feature and Instance selection via cooperative PSO. 2011 IEEE International Conference on Systems, Man and Cybernetic, Anchorage, AK, 2127-2132.         [ Links ]

31. Mohamed, S.S., Youssef, A.M., El-Saadany, E.F., & Salama, M.M.A. (2005). Artificial life feature selection techniques for prostate cancer diagnosis using TRUS images. Image Analysis and Recognition. Lecture Notes in Computer Science, 3656, 903-913.         [ Links ]

32. Polat, K., Sahan, S., Kodaz, H., & Günes, S. (2005). A new classification method for breast cancer diagnosis: feature selection artificial immune recognition system (FS-AIRS). Advances in Natural Computation. Lecture Notes in Computer Science, 3611, 830-838.         [ Links ]

33. Zainal, A., Maarof, M.A., & Shamsuddin, S.M. (2005). Feature selection using Rough-DPSO in anomaly intrusion detection. Computational Science and Its Applications-ICCSA 2007. Lecture Notes in Computer Science, 4705, 512-524.         [ Links ]

34. Bello, R., Nowé, A., Caballero, Y., Gómez, Y.Y., & Vrancx, P. (2005). Using Ant colony system meta-heuristic and Rough Set Theory for feature selection. MIC 2005: 6th Metaheuristics International Conference, Vienna, Austria.         [ Links ]

35. Ruiz-Shulcloper, J. & Abidi, M.A. (2002). Logical combinatorial pattern recognition: A Review. Recent Research Developments in Pattern Recognition, Kerala, India, 133-176.         [ Links ]

36. Garner, S.R. (1995). WEKA: The Waikato Environment for Knowledge Analysis. Proceedings of the New Zealand Computer Science Research Students Conference, Hamilton Beach, New Zealand, 57-64.         [ Links ]

37. Basturk, B. & Karaboga, D. (2006). An Artificial Bee Colony (ABC) algorithm for numeric function optimization. IEEE Swarm Intelligence Symposium 2006, Indianapolis, Indiana, USA.         [ Links ]

38. Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization (Technical Report TR06). Kayseri, Turquía: Erciyes University.         [ Links ]

39. Karaboga, D. & Basturk, B. (2007). A powerful and efficient algorithm for numerical function optimization: Artificial Bee Colony (ABC) algorithm. Journal of Global Optimization, 39(3), 459-471.         [ Links ]

40. García-Borroto, M. & Ruiz-Shulcloper, J. (2005). Selecting Prototypes in Mixed Incomplete Data. Progress in Pattern Recognition, Image Analysis and Applications. Lecture Notes in Computer Science, 3773, 450-459.         [ Links ]

41. Skalak, D.B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. Eleventh International Conference on Machine Learning. New Brunswick, NJ, USA, 293-301.         [ Links ]

42. Ishibushi, H. & Nakashima, T. (1999). Evolution of reference sets in nearest neighbor classification. Simulated Evolution and Learning. Lecture Notes on Computer Science, 1585, 82-89.         [ Links ]

43. Ahn, H. & Kim, K.J. (2009). Global optimization of case-based reasoning for breast cytology diagnosis. Expert Systems with Applications, 36(1), 724-734.         [ Links ]

44. Ahn, H., Kim, K.J., & Han, I. (2007). A case-based reasoning system with the two-dimensional reduction technique for customer classification. Expert Systems with Applications, 32(4), 1011 - 1019.         [ Links ]

45. Rozsypal, A. & Kubat, M. (2003). Selecting representative examples and attributes by a genetic algorithm. Intelligent Data Analysis, 7(4), 291 -304.         [ Links ]

46. Ros, F., Guillaume, S., Pintore, M., & Chrétien, J.R. (2008). Hybrid genetic algorithm for dual selection. Pattern Analysis and Applications, 11(2), 179-198.         [ Links ]

47. Derrac, J., Verbiest, N., García, S., Cornelis, C., & Herrera, F. (2013). On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Computing, 17(2), 223-238.         [ Links ]

48. Dasarathy, B.V. & Sánchez, J.S. (2000). Concurrent Feature and Prototype Selection in the Nearest Neighbor based Decision Process. 4th World Multiconference on Systems, Cybernetics and Informatics, Orlando, USA, 628-633.         [ Links ]

49. Kittler, J. (1978). Feature set search algorithms. Pattern recognition and signal processing (41 -60), The Netherlands: Sijthoff and Noordhoff.         [ Links ]

50. Toussaint, G. (2002). Proximity Graphs for Nearest Neighbor Decision Rules: Recent Progress. 34th Symposium on the INTERFACE, 17-20.         [ Links ]

51. Dasarathy, B.V. (1994). Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Transactions on Systems, Man and Cybernetics, 24(3), 511 -517.         [ Links ]

52. Villuendas-Rey, Y., García-Borroto, M., Medina-Pérez, M.A., & Ruiz-Shulcloper, J. (2006). Simultaneous features and objects selection for Mixed and Incomplete data. Progress in Pattern Recognition, Image Analysis and Applications. Lecture Notes on Computer Science, 4225, 597-605.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License