SciELO - Scientific Electronic Library Online

 
vol.17 issue2Linguistically-driven Selection of Correct Arcs for Dependency ParsingInference and Reconciliation in a Crowdsourced Lexical-Semantic Network author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 México Apr./Jun. 2013

 

Artículos

 

Anaphora Resolution for Bengali: An Experiment with Domain Adaptation

 

Resolución de anáfora para el bengalí: un experimento con la aplicación al dominio

 

Utpal Kumar Sikdar1, Asif Ekbal2, Sriparna Saha3, Olga Uryupina4, and Massimo Poesio5

 

1 Department of Computer Science & Engineering, Indian Institute of Technology Patna, Patna, India utpal.sikdar@iitp.ac.in

2 Department of Computer Science & Engineering, Indian Institute of Technology Patna, Patna, India asif@iitp.ac.in

3 Department of Computer Science & Engineering, Indian Institute of Technology Patna, Patna, India sriparna@iitp.ac.in

4 Department of Information Engineering & Computer Science, University of Trento, Italy uryupina@gmail.com

5 Department of Information Engineering & Computer Science, University of Trento, Italy massimo.poesio@unitn.it

 

Article received on 21/12/2012
Accepted on 16/01/2013.

 

Abstract

In this paper we present our first attempt on anaphora resolution for a resource poor language, namely Bengali. We address the issue of adapting a state-of-the-art system, BART, which was originally developed for English. Overall performance of co-reference resolution greatly depends on the high accurate mention detectors. We develop a number of models based on the heuristics used as well as on the particular machine learning employed. Thereafter we perform a series of experiments for adapting BART for Bengali. Our evaluation shows, a language-dependant system (designed primarily for English) can achieve a good performance level when re-trained and tested on a new language with proper subsets of features. The system produces the recall, precision and F-measure values of 56.00%, 46.50% and 50.80%, respectively. The contribution of this work is two-fold, viz. (i). attempt to build a machine learning based anaphora resolution system for a resource-poor Indian language; and (ii). domain adaptation of a state-of-the-art English co-reference resolution system for Bengali, which has completely different orthography and characteristics.

Keywords: Anaphora/Co-reference resolution, CRF based mention detection, Bengali, BART.

 

Resumen

Este artículo presenta el primer intento de resolución de anáfora para un idioma que tiene escasos recursos lingüísticos, específicamente el idioma bengalí, mediante la adaptación del sistema BART que pertenece al estado del arte y fue desarrollado originalmente para el inglés. El rendimiento general de resolución basada en co-referencias depende en gran medida de los detectores de menciones de alta precisión. Se desarrollaron unos modelos basándose en la heurística usada y en el método de aprendizaje de maquina seleccionado. Se hicieron unos experimentos para adaptar BART al idioma bengalí. La evaluación efectuada muestra que un sistema dependiente del idioma (diseñado principalmente para el inglés) puede lograr un buen rendimiento después de reentrenamiento y prueba, para el idioma nuevo usando conjuntos apropiados de características. El sistema produce los valores de recall, precisión y medida F iguales a 56.00

Palabras clave: Resolución de anáfora/co-referencia, detección de menciones basada en el campos aleatorios condicionales (CRF), bengalí, BART.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Acknowledgements

The authors acknowledge a partial support of the "European Community's Seventh Framework Programme (FP7/2007-2013) under the grant #288024: LIMOSINE – Linguistically Motivated Semantic aggregation engiNes"

 

References

1. Chatterji, S., Dhar, A., Barik, B., PK, M., Sarkar, S., & Basu, A. (2011). Anaphora resolution for bengali, hindi, and tamil using random tree algorithm in weka. In In Proceedings of the ICON-2011.         [ Links ]

2. Dakwale, P. & Sharma, H. (2011). Anaphora resolution in indian languages using hybrid approaches. In In Proceedings of the ICON-2011.         [ Links ]

3. Ghosh, A., Neogi, S., Chakrabarty, S., & Bandyopadhyay., S. (2011). Anaphora resolution in bengali: ju_cse_nlp system at icon 2011. In In Proceedings of the ICON-2011.         [ Links ]

4. Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In In Proceedings of the ICML. 282-289.         [ Links ]

5. Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., & Roukos, S. (2004). A mention synchronous coreference resolution algorithm based on the bell tree. In Proceedings of ACL-04. 136-143.         [ Links ]

6. McCarthy, J. F. & Lehnert, W. G. (2009). Using decision trees for coreference resolution. In Proceedings of IJCA1 1995. 1050-1055.         [ Links ]

7. Morton, T. S. (1999). Using coreference in question answering. In Proceedings of TREC-8. 85-89.         [ Links ]

8. Muller, C. & Strube, M. (2000). Multi-level annotation of linguistic data with mmax2. In Sabine Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. Germany.         [ Links ]

9. Ng, V. & Cardie, C. (2002). Improving machine learning approaches to coreference resolution. In Proceedings of ACL-02. 104-111.         [ Links ]

10. Ng, V. & Cardie, C. (2002). Improving machine learning approaches to coreference resolution. In In Proceedings of ACL-02. 104-111.         [ Links ]

11. Poesio, M. & Kabadjov, M. A. (2004). A general purpose, off-the-shelf anaphora resolution module: Implementation and preliminary evaluation. In In Proceedings of LREC-04.         [ Links ]

12. Ponzetto, S. P. & Strube, M. (2008). Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In In Proceedings of HLT-NAACL-06.         [ Links ]

13. Quinlan, J. R. C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco, Cal.         [ Links ]

14. Senapati, A. & Garain, U. (2011). Anaphora resolution system for bengali by pronoun emitting approach. In In Proceedings of the ICON-2011.         [ Links ]

15. Sha, F. & Pereira, F. (2003). Shallow Parsing with Conditional Random Fields. In In Proceedings of the NAACL-03. 134-141.         [ Links ]

16. Sobha et. al. (2011). Nlp tools contest on anaphora resolution in indian languages. In In Proceedings of the ICON-2011.         [ Links ]

17. Soon, W. M., Ng, H. T., & Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4), 521-544.         [ Links ]

18. Soon, W. M., Ng, H. T., & Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4), 521-544.         [ Links ]

19. Steinberger, J., Poesio, M., Kabadjov, M. A., & Jeek, K. (2007). Two uses of anaphora resolution in summarization. Information Processing and Management: an International Journal, 43(6), 166-1680.         [ Links ]

20. Versley, Y. (2006). A constraint-based approach to noun phrase coreference resolution in german newspaper text. In In Proceedings of Konferenz zur Verarbeitung Nat rlicher Sprache. 143-150.         [ Links ]

21. Versley, Y., Ponzetto, S. P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., & Moschitti, A. (2008). Bart: A modular toolkit for coreference resolution. In In Proceedings of LREC-2008.         [ Links ]

22. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., & Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In In Proceedings of the 6th Message Understanding Conference (MUC-6). 45-52.         [ Links ]

23. Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). Ace 2005 multilingual training corpus. In LDC2006T06. Penn.: Linguistic Data Consortium.         [ Links ]

24. Weischedel, R., Pradhan, S., Ramshaw, L., Palmer, M., Xue, N., Marcus, M., Taylor, A., Greenberg, C., Hovy, E., Belvin, R., & Houston, A. (2008). Ontonotes release 2.0. In LDC2008T0. Penn.: Linguistic Data Consortium.         [ Links ]

25. Witten, I. H. & Frank, E. Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, Cal.         [ Links ]

26. Yang, X., Zhou, G., Su, J., & Tan, C. L. (2003). Coreference resolution using competition learning approach. In Proceedings of ACL-03. 176-183.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License