SciELO - Scientific Electronic Library Online

 
vol.18 issue3Multi-document Summarization using Tensor DecompositionOn-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.3 Ciudad de México Jul./Sep. 2014

https://doi.org/10.13053/CyS-18-3-2034 

Artículos regulares

 

Entity Extraction in Biochemical Text using Multiobjective Optimization

 

Utpal Kumar Sikdar, Asif Ekbal, and Sriparna Saha

 

Department of Computer Science and Engineering, Indian Institute of Technology, Patna, India. utpal.sikdar@iitp.ac.in, asif@iitp.ac.in, sriparna@iitp.ac.in.

 

Article received on 18/01/2014.
Accepted on 01/02/2014.

 

Abstract

In this paper we propose a multiobjective modified differential evolution based feature selection and classifier ensemble approach for biochemical entity extraction. The algorithm performs in two layers. The first layer concerns with determining an appropriate set of features for the task within the framework of a supervised statistical classifier, namely, Conditional Random Field (CRF). This produces a set of solutions, a subset of which is used to construct an ensemble in the second layer. The proposed approach is evaluated for entity extraction in chemical texts, which involves identification of IUPAC and IUPAC-like names and classification of them into some predefined categories. Experiments that were carried out on a benchmark dataset show the recall, precision and F-measure values of 86.15%, 91.29% and 88.64%, respectively.

Keywords: Multiobjective modified differential evolution (MODE), feature selection, ensemble learning, conditional random field (CRF), named entity (NE).

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

1. Ekbal, A. & Saha, S. (2010). Classifier ensemble selection using genetic algorithm for named entity recognition. Research on Language and Computation, 8, 73-99.         [ Links ]

2. Ekbal, A. & Saha, S. (2010). Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In Proceedings of the Natural language processing and information systems, NLDB'10, pp. 256-267.         [ Links ]

3. Ekbal, A. & Saha, S. (2011). Weighted vote-based classifier ensemble for named entity recognition: A genetic algorithm-based approach. ACM Trans. Asian Lang. Inf. Process., 10(2).         [ Links ]

4. Ekbal, A. & Saha, S. (2012). Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition. IJDAR, 15(2), 143-166.         [ Links ]

5. Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In ICML, pp. 282-289.

6. Liu, H. & Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell, MA, USA.         [ Links ]

7. Liu, H. & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng., 17(4), 491-502. doi: http://dx.doi.org/10.1109/TKDE.2005.66.         [ Links ]

8. Sikdar, U. K., Ekbal, A., & Saha, S. (2012). Differential evolution based feature selection and classifier ensemble for named entity recognition. In COLING, pp. 2475-2490.

9. Sikdar, U. K., Ekbal, A., & Saha, S. (2014). Modified differential evolution for biochemical name recognizer. In CICLing, pp. 225-236.

10. Storn, R. & Price, K. (1997). Differential evolution — a simple and efficient heuristic for global optimization over continuous spaces. J. of Global Optimization, 11(4), 341-359. doi: 10.1023/A:1008202821328.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License