Combinación de clasificadores para bioinformática

Bonet, Isis; Rodríguez, Abdel; García, María M.; Grau, Ricardo

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

Similares em SciELO

Mais
Mais

Permalink

Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Comp. y Sist. vol.16 no.2 Ciudad de México Abr./Jun. 2012

Artículos

Combinación de clasificadores para bioinformática

Combining Classifiers for Bioinformatics

Isis Bonet, Abdel Rodríguez, María M. García y Ricardo Grau

Centro de Estudios de Informática, Universidad Central Marta Abreu de Las Villas, Cuba ibonetc@gmail.com

Artículo recibido el 22/02/2011.
Aceptado el 19/10/2012.

Resumen

Dentro de la bioinformática existen muchos problemas de clasificación, que resultan difícil de solucionar usando técnicas de inteligencia artificial por la diversidad de patrones de las bases de datos. En este trabajo se desarrolla un multiclasificador que combina clasificadores con el objetivo de mejorar los resultados de clasificación en bases de datos de bioinformática. Se basa en usar diferentes métodos de aprendizaje automatizado que funcionan como un método de agrupamiento para dividir la base a partir de los casos que son bien clasificados por cada método. El sistema aprende a decidir, mediante un metaclasificador, cuál o cuáles son los mejores clasificadores para un caso determinado. Se usaron once bases de datos internacionales para comparar el modelo propuesto con los multiclasificadores más conocidos en la literatura. Se usan pruebas estadísticas que demuestran que los resultados obtenidos por el nuevo multiclasificador son significativamente superiores a los obtenidos con otros modelos.

Palabras clave: clasificación, reconocimiento de patrones, aprendizaje, multiclasificador.

Abstract

There are several classification problems in Bioinformatics which are difficult to solve using artificial intelligence techniques because of the diversity of patterns in datasets. In this paper, an ensemble of classifiers is developed to improve the accuracy of classification in bioinformatics datasets. This model is based on the use of different machine learning methods, and it forms clusters to divide the dataset taking into account the performance of the base methods. By means of a meta-classifier, the system learns to decide which classifiers are the best for a given case. In order to compare the new model with some well-known multi-classifiers, eleven international databases are used. It is demonstrated by statistical tests that results of our model are significantly better than those obtained with previous models.

Keywords. Model classification, pattern recognition, learning, multi-classifiers.

DESCARGAR ARTÍCULO EN FORMATO PDF

Referencias

1. Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., & Robles, V. (2006). Machine learning in bioinformatics. Briefings in Bioinformatics, 7(1), 86–112. [ Links ]

2. Baldi, P. & Soren, B. (2001). Bioinformatics: The Machine Learning Approach (2nd ed.). Cambridge, Mass.: MIT Press. [ Links ]

3. Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21–45. [ Links ]

4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. [ Links ]

5. Freund, Y. & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Thirteenth International Conference on Machine Learning (ICML'96), Bari, Italy, 148–156. [ Links ]

6. Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. [ Links ]

7. Kuncheva, L.I. & Whitaker, C.J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181–207. [ Links ]

8. Brown, G., Wyatt, J., Harris, R., & Yao, X. (2005). Diversity creation methods: a survey and categorisation. Information Fusion, 6(1), 5–20. [ Links ]

9. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., & Hinton, G.E. (1991). Adaptative mixtures of local experts. Neural Computation, 3(1), 79–87. [ Links ]

10. Nanni, L. & Lumini, A.(2006). Fuzzy Bagging: A novel ensemble of classifiers. Pattern Recognition, 39(3), 488–490. [ Links ]

11. Nguyen, M.H., Abbass, H.A., & McKay, R.I. (2006). A novel mixture of experts model based on cooperative coevolution. Neurocomputing, 70(1-3), 155–163. [ Links ]

12. Saha, S., Murthy, C.A., & Pal, S.K. (2007). Rough set based ensemble classifier for web page classification. Fundamenta Informaticae, 76(1-2), 171–187. [ Links ]

13. Dimitrakakis, C. & Bengio, S. (2005). Online adaptive policies for ensemble classifiers. Neurocomputing, 64, 211–221. [ Links ]

14. Partalas, I., Tsoumakas, G., Katakis, I., & Vlahavas, I. (2006). Ensemble pruning using reinforcement learning. 4th Helenic conference on Advances in Artificial Intelligence (SETN'06). Lecture Notes in Artificial Intelligence, 3955, 301–310. [ Links ]

15. Asuncion, A. & Newman, D.J. 2007. UCI Machine Learning Repository. Retrieved from http://www.ics.uci.edu/$\sim$mlearn/MLRepository.html. [ Links ]

16. Dietterich, T.G. (2000). Ensemble methods in machine learning. First International Workshop on Multiple Classifier Systems (MCS'00), Cagliari, Italy, 1–15. [ Links ]

17. Ghosh, J. (2002). Multiclassifier systems: Back to the future. Multiple Classifier Systems: Third International Workshop (MCS 2002). Lecture Notes in Computer Science, 2364, 1–15. [ Links ]

18. Canuto, A.M.P., Abreu, M.C.C., Oliveira, L.M., Xavier Jr., J.C., & Santos, A.M. (2007). Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognition Letters, 28(4), 472–486. [ Links ]

19. Hansen, L.K. & Salamon, P. (1990). Neural networks ensembles. IEEE Transactions on Pattern Analisys and Machine Intelligence, 12(10), 993–1001. [ Links ]

20. Banfield, R.E., Hall, L.O., Bowyer, K.W., & Kegelmeyer, W.P. (2005). Ensemble diversity measures and their application to thinning. Information Fusion, 6(1), 49–62. [ Links ]

21. Tang, E.K., Suganthan, P.N., & Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65(1), 247-271. [ Links ]

22. Provost, F.J. & Fawcett, T. (1997). Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, California, USA, 43–48. [ Links ]