Integración de modelos de agrupamiento y reglas de asociación obtenidos de múltiples fuentes de datos

Morales Vega, Daymi; Martín Rodríguez, Diana; Wilford Rivera, Ingrid; Rosete Suárez, Alejandro

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.16 n.2 Ciudad de México Apr./Jun. 2012

Artículos

Integración de modelos de agrupamiento y reglas de asociación obtenidos de múltiples fuentes de datos

Integration of Association Rules and Clustering Models Obtained from Multiple Data Sources

Daymi Morales Vega, Diana Martín Rodríguez, Ingrid Wilford Rivera y Alejandro Rosete Suárez

Instituto Superior Politécnico José Antonio Echeverría, La Habana, Cuba dmorales@ceis.cujae.edu.cu, dmartin@ceis.cujae.edu.cu, iwilford@ceis.cujae.edu.cu, rosete@ceis.cujae.edu.cu

Artículo recibido el 04/02/2011.
Aceptado el 10/10/2011.

Resumen

Una alternativa posible para descubrir conocimiento sobre bases de datos distribuidas, usando técnicas de Minería de Datos, es rehusar los modelos de minería de datos locales obtenidos en cada base de datos e integrarlos para obtener patrones globales. Este proceso debe realizarse sin acceder a los datos directamente. Este trabajo se centra en la propuesta de dos métodos para la integración de modelos de Minería de Datos: Modelos de Reglas de Asociación y Agrupamiento, específicamente para reglas de asociación obtenidas usando soporte y confianza como medidas de calidad y agrupamientos basados en centroides. Estos modelos fueron obtenidos al analizar múltiples conjuntos de datos homogéneos. El estudio experimental muestra que se obtuvieron modelos globales de calidad en un tiempo razonable cuando se aumentan la cantidad de patrones locales a integrar.

Palabras clave. Integración, modelos de minería de datos, reglas de asociación, agrupamiento, Patrones.

Abstract

One possible way to discover knowledge over distributed data sources, using Data Mining techniques, is to reuse the models of local mining found in each data source and look for patterns globally valid. This process can be done without accessing the data directly. This paper focuses on the proposal of two methods for integrating data mining models: Association Rules and Clustering Models, specifically rules were obtained using support and confidence as measures of quality and clustering based on centroids. It was necessary to use metaheuristics algorithms to find a global model that is as close as possible to the local models. These models were obtained using homogeneous data sources. The experimental study showed that the proposed methods obtain global models of quality in a reasonable time when increasing the amount of local patterns to integrate.

Keywords. Integration, data mining models, association rules, clustering, patterns.

DESCARGAR ARTÍCULO EN FORMATO PDF

Referencias

1. Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules in Large Databases. 20th International Conference on Very Large Data Bases (VLDB'94), Santiago de Chile, Chile, 487–499. [ Links ]

2. Crestana-Jensen, V. & Soparkar, N. (2000). Frequent Itemset Counting Across Multiple Tables. 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'00), Kyoto, Japan, 49–61. [ Links ]

3. Fajardo, J., Rosete, A. (2011). Algoritmo Multigenerador de soluciones, para la competencia y colaboración de generadores metaheurísticos. Revista Internacional de Investigación de Operaciones (RIIO), 1(1), 57-63, ISSN: 2145-9517. [ Links ]

4. Gionis, A., Mannila, H., & Tsaparas, P. (2005). Clusteringaggregation. 21st International Conferenceon Data Engineering (ICDE 2005), Tokyo, Japan, 341–352. [ Links ]

5. Hore, P., Hall, L.O., &. Goldgof, D.B. (2009). A scalable framework for cluster ensembles. Pattern Recognition, 42(5), 676–688. [ Links ]

6. Horn, J. & Goldberg, D.E. (1994). Genetic Algorithm Difficulty and the Modality of Fitness Landscapes. Third Workshop on Foundations of Genetic Algorithms, Colorado, USA, 243–269. [ Links ]

7. Lange, T. & Buhmann, J.M. (2005). Combining partitions by probabilistics label aggregation. 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05), Chicago, IL, USA, 147–156. [ Links ]

8. Long, B., Zhang, Z., & Yu, P.S. (2005). Combining multiple clusterings by soft correspondence. Fifth IEEE International Conference on Data Mining (ICDM'05), Houston, Texas, 282–289. [ Links ]

9. Paul, S. & Saravanan, V. (2008). Knowledge integration in a Parallel and distributed environment with association rule mining using XML data. IJCSNS International Journalof Computer Scienceand Network Security, 8(5), 334–339. [ Links ]

10. Rosete, A. (2000). Una solución flexible y eficiente para el trazado de grafos basada en el Escalador de Colinas Estocástico. Tesis de Doctorado, Facultad de Ingeniería Industrial, CEIS, La Habana, Cuba. [ Links ]

11. Strehl, A. & Ghosh, J. (2002). Cluster Ensembles A Knowledge Reuse Framework for Combining Multiple Partitions. The Journal of Machine Learning Research, 3(Dec), 583–617. [ Links ]

12. Wilford-Rivera, I., Ruiz-Fernández, D., Rosete-Suárez, A., & Marín-Alonso, O. (2010). Integrating Data Mining Models from Distributed Data Sources. 7th International Symposium on Distributed Computing and Artificial Intelligence (DCAI´2010), Advances in Intelligence and Soft Computing, 79, 389–396. [ Links ]

13. Witten, I.H. & Frank, E. (2000). Nuts and Bolts: Machine Learnig Algorithms in Java. Data Mining Practical Machine Learning Tools and Techniques with Java Implementations (265–276), San Francisco, Calif.: Morgan Kaufmann. [ Links ]

14. Witten, I.H. & Frank, E. (2005). The Weka machine learning workbench. Data Mining Practical Machine Learning Tools and Techniques, Second Edition (363–427), San Francisco, Calif.: Morgan Kaufmann. [ Links ]

15. Wu, X. & Zhang, S. (2003). Synthesizing High-Frequency Rules from Different Data Sources. IEEE Transactions on Knowledge and Data Engineering, 15(2), 353–367. [ Links ]

16. Yuret, D. & de la Maza, M. (1993). Dynamic Hill Climbing: Overcoming the limitations of optimization techniques. Second Turkish Symposium on Artificial Intelligence and Neural Networks, 208–212. [ Links ]

17. Zhang, X. & Brodley, C.E. (2004). Solving cluster ensemble problem by bipartite graph partitioning. Twenty-first international Conference on Machine learning (ICML'04), Alberta, Canada, 36. [ Links ]