SciELO - Scientific Electronic Library Online

 
 issue52A Completeness of Metrics for Topological Relations in 3D Qualitative Spatial ReasoningAn Approach towards Semi-automated Biomedical Literature Curation and Enrichment for a Major Biological Database author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Polibits  n.52 México Jul./Dec. 2015

https://doi.org/10.17562/PB-52-2 

EMiner: A Tool for Selecting Classification Algorithms and Optimal Parameters

 

Rayrone Zirtany Nunes Marques, Luciano Reis Coutinho, Tiago Bonini Borchartt, Samyr Béliche Vale, and Francisco José da Silva e Silva

 

Universidade Federal do Maranhão, Programa de Pós-Graduação em Ciência da Computação, Av. dos Portugueses 1966, Bacanga, São Luís, MA, Brazil (e-mail: rayronezirtany@gmail.com, lrc@deinf.ufma.br, tiagobonini@deinf.ufma.br, samy@deinf.ufma.br, fssilva@deinf.ufma.br).

 

Manuscript received on July 14, 2015
Accepted for publication on September 18, 2015
Published on October 15, 2015

 

Abstract

In this paper, Genetic Algorithm (GA) is used to search for combinations of learning algorithms and associated parameters with maximum accuracy. An important feature of the approach is that the GA initial population is formed by using parameter values gathered from ExpDB (a public database of data mining experiments). The proposed approach was implemented in a tool called EMiner, built on top of a grid based software infrastructure for developing collaborative applications in medicine and healthcare domains (ECADeG project). Experiments on 16 datasets from the UCI repository were performed. The results obtained have shown that the strategy of combining the data from ExpDB via GA is effective in finding classification models with good accuracy.

Key words: Data mining, medicine and healthcare, algorithm selection, parameter optimization, genetic algorithms.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

ACKNOWLEDGMENT

The authors would like to thank FAPEMA (State of Maranhão Research Agency-Brazil) for supporting this work, grants INRIA-00114/11.

 

REFERENCES

[1] N. Lavrac, "Selected techniques for data mining in medicine," Artificial Intelligence in Medicine, vol. 16, pp. 3-23, 1999.         [ Links ]

[2] M. Adnan, W. Husain, and N. Rashid, "A hybrid approach using naive bayes and genetic algorithm for childhood obesity prediction," in 2012 International Conference on Computer Information Science (ICCIS), vol. 1, June 2012, pp. 281-285.         [ Links ]

[3] P. K. Srimani and M. S. Koti, "Medical diagnosis using ensemble classifiers -a novel machine-learning approach," in Journal of Advanced Computing, 1st ed. Columbia International Publishing, 2013, pp. 9-27.         [ Links ]

[4] S. Amin, K. Agarwal, and R. Beg, "Genetic neural network based data mining in prediction of heart disease using risk factors," in 2013 IEEE Conference on Information Communication Technologies (ICT), April 2013, pp. 1227-1231.         [ Links ]

[5] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Pratical Machine Learning Tools and Techniques, 3rd ed. Elsevier, 2011.         [ Links ]

[6] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, "Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms," in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD-13. New York, NY, USA: ACM, 2013, pp. 847-855. [Online]. Available: http://doi.acm.org/10.1145/2487575.2487629        [ Links ]

[7] R. Leite, P. Brazdil, and J. Vanschoren, "Selecting classification algorithms with active testing," in Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, ser. MLDM-12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 117-131. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-31537-4_10        [ Links ]

[8] F. Maia, R. Araujo, L. C. Muniz, R. Zirtany, L. Coutinho, S. Vale, F. J. Silva, P. Cincilla, I. Chabbouh, S. Monnet, L. Arantes, and M. Shapiro, "A grid based distributed cooperative environment for health care research," in Foundations ofHealth Information Engineering and Systems Lecture Notes in Computer Science Volume, vol. 7789. Springer Berlin Heidelberg, 2013, pp. 142-150. [Online]. Available: http://link.springer.com/chapter/10.1007%2F978-3-642-39088-3_9        [ Links ]

[9] J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. Cambridge, MA, USA: MIT Press, 1992.         [ Links ]

[10] H. Blockeel, J. Vanschoren, B. Pfahringer, and G. Holmes, "Experiment databases," Machine Learning, vol. 87, no. 2, pp. 127-158, 2012. [Online]. Available: http://dx.doi.org/10.1007/s10994-011-5277-0        [ Links ]

[11] F. Samadzadegan, A. Soleymani, and R. Abbaspour, "Evaluation of genetic algorithms for tuning SVM parameters in multi-class problems," in 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI), Nov 2010, pp. 323-328.         [ Links ]

[12] J. Zhou, O. Maruatona, and W. Wang, "Parameter optimization for support vector machine classifier with IO-GA," in 2011 First International Workshop on Complexity and Data Mining (IWCDM), Sept 2011, pp. 117-120.         [ Links ]

[13] K.-K. Seo, "A GA-based feature subset selection and parameter optimization of support vector machine for content-based image retrieval," in Proceedings of the 3rd International Conference on Advanced Data Mining and Applications, ser. ADMA-07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 594-604. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-73871-8_57        [ Links ]

[14] A. Sureka and K. Indukuri, "Using genetic algorithms for parameter optimization in building predictive data mining models," in Advanced Data Mining and Applications, ser. Lecture Notes in Computer Science, C. Tang, C. Ling, X. Zhou, N. Cercone, and X. Li, Eds. Springer Berlin Heidelberg, 2008, vol. 5139, pp. 260-271. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-88192-6_25        [ Links ]

[15] R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection," in Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, ser. IJCAI-95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995, pp. 1137-1143.         [ Links ]

[16] J. R. Quinlan, "Improved use of continuous attributes in c4.5," J. Artif. Int. Res., vol. 4, no. 1, pp. 77-90, Mar. 1996. [Online]. Available: http://dl.acm.org/citation.cfm?id=1622737.1622742        [ Links ]

[17] A. K. Tanwani, J. Afridi, M. Z. Shafiq, and M. Farooq, "Guidelines to select machine learning scheme for classification of biomedical datasets," Islamabad, Pakistan, 2009.         [ Links ]

[18] D.-Y. Chen, T.-R. Chuang, and S.-C. Tsai, "JGAP: A Java-based graph algorithms platform," Softw. Pract. Exper., vol. 31, no. 7, pp. 615-635, Jun. 2001. [Online]. Available: http://dx.doi.org/10.1002/spe.379        [ Links ]

[19] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: An update," SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10-18, Nov. 2009.         [ Links ]

[20] K. Bache and M. Lichman, "UCI machine learning repository," 2013. [Online]. Available: http://archive.ics.uci.edu/ml        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License