Efficiently Finding the Optimum Number of Clusters in a Dataset with a New Hybrid Cellular Evolutionary Algorithm

Arellano-Verdejo, Javier; Guzmán-Arenas, Adolfo; Godoy-Calderon, Salvador; Barrón Fernández, Ricardo

doi:10.13053/CyS-18-2-2014-034

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.2 Ciudad de México Apr./Jun. 2014

https://doi.org/10.13053/CyS-18-2-2014-034

Artículos regulares

Efficiently Finding the Optimum Number of Clusters in a Dataset with a New Hybrid Cellular Evolutionary Algorithm

Búsqueda eficiente del óptimo número de grupos en un conjunto de datos con un nuevo algoritmo evolutivo celular híbrido

Javier Arellano-Verdejo, Adolfo Guzmán-Arenas, Salvador Godoy-Calderon, and Ricardo Barrón Fernández

Centro de Investigación en Computación, Instituto Politécnico Nacional, México D.F., Mexico. jarellanob10@sagitario.cic.ipn.mx, a.guzman@acm.org, sgodoyc@cic.ipn.mx, rbarron@cic.ipn.mx

Abstract

A challenge in hybrid evolutionary algorithms is to employ efficient strategies to cover all the search space, applying local search only in actually promising search areas; on the other hand, clustering algorithms, a fundamental base for data mining procedures and learning techniques, suffer from the lack of efficient methods for determining the optimal number of clusters to be found in an arbitrary dataset. Some existing methods use evolutionary algorithms with cluster validation index as the objective function. In this article, a new cellular evolutionary algorithm based on a hybrid model of global and local heuristic search is proposed for the same task, and extensive experimentation is done with different datasets and indexes.

Keywords. Clustering, cellular genetic algorithm, micro-evolutionary algorithms, particle swarm optimization, optimal number of clusters.

Resumen

Un reto actual en el área de algoritmos evolutivos híbridos es el empleo eficiente de estrategias para cubrir la totalidad del espacio de búsqueda usando búsqueda local solo en las regiones prometedoras. Por otra parte, los algoritmos de agrupamiento, fundamentales para procesos de minería de datos y técnicas de aprendizaje, carecen de métodos eficientes para determinar el número óptimo de grupos a formar a partir de un conjunto de datos. Algunos de los métodos existentes hacen uso de algunos algoritmos evolutivos, así como una función para validación de agrupamientos como su función objetivo. En este artículo se propone un nuevo algoritmo evolutivo celular, para abordar dicha tarea. El algoritmo propuesto está basado en un modelo híbrido de búsqueda, tanto global como local y tras presentarlo se prueba con una estensa experimentación sobre diferentes conjuntos de datos y diferentes funciones objetivo.

Palabras clave. Agrupamiento, algoritmo genético celular, microalgoritmos evolutivos, optimización por cúmulo de partículas, número óptimo de clases.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgements

The authors would like to express their gratitude to SIP-IPN, CONACyT and ICyT-DF for their economic support of this research, particularly, through grants SIP-20130932 and ICyT-PICCO-10-113.

References

1. Bandyopadhyay, S. & Maulik, U. (2002). An evolutionary technique based on k-means algorithm for optimal clustering in rn. Information Sciences, 146(1), 221-237. [ Links ]

2. Bandyopadhyay, S. & Maulik, U. (2002). Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition, 35(6), 11971208. [ Links ]

3. Bellis, M. A., Jarman, I., Downing, J., Perkins, C., Beynon, C., Hughes, K., & Lisboa, P. (2012). Using clustering techniques to identify localities with multiple health and social needs. Health & place, 18(2), 138-143. [ Links ]

4. Cabrera, J. C. F. & Coello, C. A. C. (2007). Handling constraints in particle swarm optimization using a small population size. In MICAI 2007: Advances in Artificial Intelligence. Springer, 41-51. [ Links ]

5. Cabrera, J. C. F. & Coello, C. A. C. (2010). Micro-mopso: a multi-objective particle swarm optimizer that uses a very small population size. In Multi-Objective Swarm Intelligent Systems. Springer, 83-104. [ Links ]

6. Cao, J., Wu, Z., Wu, J., & Liu, W. (2012). Towards information-theoretic k-means clustering for image indexing. Signal Processing, 39(2), 1-12. [ Links ]

7. Chang, L., Duarte, M. M., Sucar, L., & Morales, E. F. (2012). A bayesian approach for object classification based on clusters of sift local features. Expert Systems With Applications, 39(2), 1679-1686. [ Links ]

8. Correa-Morris, J., Espinosa-Isidron, D. L., & Alvarez-Nadiozhin, D. R. (2010). An incremental nested partition method for data clustering. Pattern Recognition, 43(7), 2439-2455. [ Links ]

9. Cortina-Borja, M. (2012). Handbook of parametric and nonparametric statistical procedures. Journal of the Royal Statistical Society: Series A (Statistics in Society), 175(3), 829-829. [ Links ]

10. Das, S., Abraham, A., & Konar, A. (2008). Automatic clustering using an improved differential evolution algorithm. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 38(1), 218-237. [ Links ]

11. Davies David L. Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 224-227. [ Links ]

12. Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Lecture notes in computer science, 1917, 849-858. [ Links ]

13. Franek, L., Abdala, D., Vega-Pons, S., & Jiang, X. (2011). Image segmentation fusion using general ensemble clustering methods. Computer Vision-ACCV 2010, 373-384. [ Links ]

14. Garcia, S., Molina, D., Lozano, M., & Herrera, F. (2009). A study on the use of non-parametric tests for analyzing the evolutionary algorithms? behaviour: a case study on the cec 2005 special session on real parameter optimization. Journal of Heuristics, 15(6), 617-644. [ Links ]

15. Goldberg, D. E. (1989). Sizing populations for serial and parallel genetic algorithms. Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann Publishers Inc., pp. 70-79. [ Links ]

16. Grosan, C., Abraham, A., & Ishibuchi, H. (2007). Hybrid evolutionary algorithms. Springer Publishing Company, Incorporated. [ Links ]

17. Hartigan, J. A. & Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm. Applied statistics, 100108. [ Links ]

18. Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008). Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition, 41(9), 2742-2756. [ Links ]

19. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323. [ Links ]

20. Jarboui, B., Cheikh, M., Siarry, P., & Rebai, A. (2007). Combinatorial particle swarm optimization (cpso) for partitional clustering problem. Applied Mathematics and Computation, 192(2), 337-345. [ Links ]

21. Kanade, P. M. & Hall, L. O. (2003). Fuzzy ants as a clustering concept. 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS 2003), IEEE, pp. 227-232. [ Links ]

22. Kennedy, J. & Eberhart, R. (1995). Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks, volume 4, IEEE, pp. 1942-1948. [ Links ]

23. Kodratoff, Y. & Michalski, R. S. (1990). Machine learning: an artificial intelligence approach, volume 3. Morgan Kaufmann Publishers. [ Links ]

24. Krishnakumar, K. (1989). Micro-genetic algorithms for stationary and non-stationary function optimization. Advances in Intelligent Robotics Systems Conference, International Society for Optics and Photonics, pp. 289-296. [ Links ]

25. Kwedlo, W. (2011). A clustering method combining differential evolution with the k-means algorithm. Pattern Recognition Letters, 32(12), 1613-1621. [ Links ]

26. Lau, R. Y., Li, Y., Song, D., & Kwok, R. C. W. (2008). Knowledge discovery for adaptive negotiation agents in e-marketplaces. Decision Support Systems, 45(2), 310323. [ Links ]

27. Lopez-Ortega, O. & Rosales, M.-A. (2011). An agent-oriented decision support system combining fuzzy clustering and the ahp. Expert Systems with Applications, 38(7), 8275-8284. [ Links ]

28. Lu, Y., Lu, S., Fotouhi, F., Deng, Y., & Brown, S. J. (2004). Fgka: a fast genetic k-means clustering algorithm. Proceedings of the 2004 ACM symposium on Applied computing, ACM, pp. 622-623. [ Links ]

29. Martínez-Álvarez, F., Troncoso, A., Riquelme, J., & Aguilar-Ruiz, J. (2011). Energy time series forecasting based on pattern sequence similarity. IEEE Transactions on Knowledge and Data Engineering, IEEE, pp. 12301243 vol.23 No. 8. [ Links ]

30. Martinez-Trinidad, J. F. & Guzman-Arenas, A. (2001). The logical combinatorial approach to pattern recognition, an overview through selected works. Pattern Recognition, 34(4), 741-751. [ Links ]

31. Maulik, U. & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE T. Pattern, 24(12), 1650-1654. [ Links ]

32. Niknam, T., Firouzi, B. B., & Nayeripour, M. (2008). An efficient hybrid evolutionary algorithm for cluster analysis. World Applied Sciences Journal, 4(2), 300-307. [ Links ]

33. Omran, M., Engelbrecht, A. P., & Salman, A. (2005). Particle swarm optimization method for image clustering. International Journal of Pattern Recognition and Artificial Intelligence, 19(03), 297-321. [ Links ]

34. Parsopoulos, K. E. (2009). Cooperative micro-differential evolution for high-dimensional problems. Proceedings of the 11th Annual conference on Genetic and evolutionary computation, ACM, pp. 531-538. [ Links ]

35. Rousseeuw, P. J. & Kaufman, L. (1990). Finding groups in data: An introduction to cluster analysis. John, John Wiley& Sons. [ Links ]

36. Saha, I., Maulik, U., & Bandyopadhyay, S. (2009). A new differential evolution based fuzzy clustering for automatic cluster evolution. IEEE International Advance Computing Conference (IACC 2009), 706-711. [ Links ]

37. Vega-Pons, S., Ruiz-Shulcloper, J., & Guerra-Gandon, A. (2011). Weighted association based methods for the combination of heterogeneous partitions. Pattern Recognition Letters, 32(16), 2163-2170. [ Links ]

38. Villa, A., Chanussot, J., Benediktsson, J. A., Jutten, C., & Dambreville, R. (2012). Unsupervised methods for the classification of hyperspectral images with low spatial resolution. Pattern Recognition. [ Links ]

39. Viveros-Jiménez, F., Mezura-Montes, E., & Gelbukh, A. (2009). Elitistic evolution: a novel micro-population approach for global optimization problems. Eighth Mexican International Conference on Artificial Intelligence (MICAI 2009), IEEE, pp. 15-20. [ Links ]

40. Viveros Jiménez, F., Mezura Montes, E., & Gelbukh, A. (2012). Empirical analysis of a micro-evolutionary algorithm for numerical optimization. Int. J. Phys. Sci, 7, 1235-1258. [ Links ]

41. Wang, X., Yang, C., & Zhou, J. (2009). Clustering aggregation by probability accumulation. Pattern Recognition, 42(5), 668-675. [ Links ]

42. Wei-Ping Lee, S.-W. C. (2010). Automatic clustering with differential evolution using a cluster number oscillation method. Intelligent Systems and Applications, 218-237. [ Links ]

43. Xie, X. L. & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and machine Intelligence, 13(4). [ Links ]

44. Xu, R., Wunsch, D., et al. (2005). Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16(3), 645-678. [ Links ]

45. Yan, H., Chen, K., Liu, L., & Yi, Z. (2010). Scale: a scalable framework for efficiently clustering transactional data. Data mining and knowledge Discovery, 20(1), 1-27. [ Links ]

46. Yang, Y., Liao, Y., Meng, G., & Lee, J. (2011). A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Systems With Applications, 38(9), 1311-1320. [ Links ]