SciELO - Scientific Electronic Library Online

 
vol.12 issue1Finite Production Rate Model With Quality Assurance, Multi-customer and Discontinuous DeliveriesEstimation of LiBr-H20 Using Multimode Interference (MMI) author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Journal of applied research and technology

On-line version ISSN 2448-6736Print version ISSN 1665-6423

J. appl. res. technol vol.12 n.1 Ciudad de México Feb. 2014

 

AnaPro, Tool for Identification and Resolution of Direct Anaphora in Spanish

 

I. Toledo-Gómez1, E. Valtierra-Romero1, A. Guzmán-Arenas*2, A. Cuevas-Rasgado3, L. Méndez-Segundo.1

 

1 Escuela Superior de Cómputo Instituto Politécnico Nacional México, D. F., México.

2 Centro de Investigación en Computación Instituto Politécnico Nacional México, D. F., México. *a.guzman@acm.org

3 Universidad Autónoma del Estado de México Centro Universitario Texcoco, Estado de México.

 

Abstract

AnaPro is software that solves direct anaphora in Spanish, specifically pronouns: it finds the noun or group of words to which the pronoun refers. It locates in the previous sentences the referent or antecedent which the pronoun replaces. An example of a direct anaphora solved is the pronoun "he" in the sentence "He is sad." Much of the work on anaphora has been done for texts in English; thus, we specifically focus on Spanish documents.

AnaPro directly supports text analysis (to understand what a document says), a non trivial task since there are different writing styles, references, idiomatic expressions, etc. The problem grows if the analyzer is a computer, because they lack "common sense" (which persons possess). Hence, before text analysis, its preprocessing is required, in order to assign tags (noun, verb, ) to each word, find the stems, disambiguate nouns, verbs, prepositions, identify colloquial expressions, identify and resolve anaphora, among other chores.

AnaPro works for Spanish sentences. It is a novel procedure, since it is automatic (no user intervenes during the resolution) and it does not need dictionaries. It employs heuristics procedures to discover the semantics and help in the decisions; they are rather easy to implement and use limited knowledge. Nevertheless, its results are good (81% of correct answers, at least). However, more tests will give a better idea of its goodness.

Keywords: 1.2. Artificial Intelligence, 1.2.7 Natural Language processing, Text Analysis, Anaphora resolution.

 

Resumen

AnaPro es un software que resuelve problemas con anáforas directas en español, especialmente pronombres: la herramienta encuentra el sustantivo o grupo de palabras al cual se refiere el pronombre. Localiza en oraciones previas la referencia o antecedente y lo reemplaza por el pronombre que le corresponde. Un ejemplo de anáfora directa resuelta es el pronombre "él", en la oración "Él está triste".

AnaPro apoya directamente el análisis de textos (para entender lo que el documento dice), tarea no trivial ya que existen diferentes formas y estilos de escribir, referenciar, expresiones regionales (mexicanismos, argentinismos, etc.). Todavía se complica más si el analizador es una máquina porque no tiene "sentido común" (que cualquier persona puede tener). Así, antes de analizar el texto, se requiere pre-procesarlo para asignar etiquetas (sustantivos, y verbos, entre otros) a cada palabra, encontrar las derivaciones, desambiguar sustantivos, verbos, preposiciones, identificar expresiones coloquiales, identificar y resolver anáforas, entre otras tareas.

AnaPro trabaja con oraciones en español. Es un proceso novedoso porque es totalmente automático (no requiere la intervención de un usuario) y tampoco necesita de diccionarios. Emplea un procedimiento heurístico para descubrir la semántica y apoyarse en las decisiones que debe de tomar. Ha resultado ser bueno (con un 81-100% de aciertos); sin embargo, más ejemplos nuevos darán mejor idea de su desempeño.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Acknowledgements

Authors I.T. and E.V. would like to acknowledge ESCOM-IPN, where they defended their thesis, #20110083, which gives a more detailed description of AnaPro. Work herein reported was partially sponsored by CONACYT Grant #128163 (Project OM*), by IPN (A.G. as Resident Scientist), and by SNI.

 

References

[1] Chistopher Kenedy and Branimir Boguraev, 1996. Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser. Proceeding COLING '96 Proceedings of the 16th conference on Computational linguistics - Vol. 1 Pages 113-118. Copenhagen, Denmark.         [ Links ]

[2] A. Cuevas "Merging of Ontologies Using Semantic Properties". Ph. D. Thesis, CIC-IPN. In Spanish, 2006.         [ Links ]

[3] A. Cuevas, and A. Guzmán. "Knowledge accumulation through automatic merging of ontologies", Expert Systems with Applications, Vol. 37, Elsevier Editorial System, USA. ISSN: 0957-4174, 2010, pp 1991-2005.         [ Links ]

[4] A. Ferrández et al. "¿Cómo influye la resolución de la anáfora pronominal en los sistemas de búsqueda de respuestas?" RUA. Repositorio Institucional de la Universidad de Alicante. Revistas. Procesamiento del Lenguaje Natural Vol. 26, ISSN. 1135-5948, septiembre 2000, INV-PLN-Artículos de Revistas, España. Web site http://hdl.handle.net/10045/1907 (Last consulted August 21, 2012) In Spanish. pp. 231-238        [ Links ]

[5] A. Toral et al. "EAGLES compliant tagset for the morphosyntactic tagging of Esperanto". In Proceedings of the 5th International Conference on Recent Advances in Natural Language Processing (RANLP). Borovets, Bulgaria, 2005.         [ Links ]

[6] D. Jurafsky and J.H. Martin. "Speech and language processing". Second edition. Pearson-Prentice Hall, 2008.         [ Links ]

[7] D. Villanueva et. al. "Using frames to disambiguate prepositions", Journal Expert Systems with Applications Vol. 40. Elsevier Editorial System, USA, ISSN: 09574174. 2013. pp. 598-610.         [ Links ]

[8] D. Villanueva and R. Nava, "Herramienta para identificar las raíces de un verbo en una oración y un desambiguador de preposiciones". B. Sc. thesis No. 2010-0069 ESCOM-IPN. May 2011. In Spanish.         [ Links ]

[9] F. Colorado "Mapping words to concepts: disambiguation". M. Sc. Thesis, CIC-IPN, Mexico. In Spanish, 2008.         [ Links ]

[10] F. Salguero and F. Soler "Resolución abductiva de anáforas pronominales". In David Fernández, Emilio Gómez-Caminero, Ignacio Hernández (Eds.), Estudios de Lógica, Lenguaje y Epistemología. IV Jornadas Ibéricas, Fénix Editora. In Spanish, 2010.         [ Links ]

[11] G. Sidorov and O. Olivas, "Resolución de anáfora pronominal para el español usando el método de conocimiento limitado". Avances en la Ciencia de la computación, 7° congreso internacional ENC-2006, México, In Spanish, 2006, pp. 276-281.         [ Links ]

[12] J. Meneses and M. García, "Construction of an analyzer of colloquial sentences and a syntactic tagger of sentences of a document". B. Sc. thesis No. 2010-0075 ESCOM-IPN. May 2011. In Spanish.         [ Links ]

[13] G. López-yebra et. al., "SERCDD. Knowledge extraction and representation from descriptive Spanish documents". Submitted to Computational Intelligence. 2013.         [ Links ]

[14] K. Chistopher and B. Branimir, "Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser". COLING '96 Proceedings of the 16th conference on Computational linguistics, Copenhagen, Denmark. Vol. 1, 1996. pp 113-118.         [ Links ]

[15] L. Shalom and J.H. Leass. "An algorithm for pronominal anaphora resolution". Computational Linguistics 20, 4, 1994, pp. 535-561.         [ Links ]

[16] M. Haliday, and H. Ruqaiya "Cohesion in English Longman", Group United Kingdom, ISBN-10: 0582550416, ISBN-13: 978-0582550414, July 1976, London        [ Links ]

[17] M. Minsky "A Framework for Representating Knowledge". MIT-AI Laboratory Memo 306, June 1974 Reprinted in The Psychology of Computer Vision, P. Winston,led.lMcGrawHill,l1975.lLinkM http://web.media.it.edu/~minsky/papers/Frames/frames.html (Last consulted August 21, 2012)        [ Links ]

[18] M. Palomar et. al., "An Algorithm for Anaphora Resolution in Spanish Texts". Computational Linguistics -Special issue on computational anaphora resolution Vol. 27 Issue 4, Pages 545-567 MIT Press Cambridge, MA, USA, 2001.         [ Links ]

[19] M. Poesio and H. Rieser, "Completions, coordination, and alignment in dialogue", Dialogue and Discourse 1, 1-89, 2010.         [ Links ]

[20] P. Martínez "Resolución Computacional de la anáfora en diálogos; Estructura del discurso y conocimiento lingüístico", RUA. Repositorio Institucional de la Universidad de Alicante. Revistas. Procesamiento del Lenguaje Natural Vol. 28, ISSN. 1135-5948, mayo 2002 INV-PLN-Artículos de Revistas. España. Web site http://rua.ua.es/dspace/handle/10045/1851j (Lastjconsulted August 21, 2012) In Spanish.         [ Links ]

[21] T. Liang, and D.S. Wu, "Automatic Pronominal Anaphora Resolution in English Texts". In Computational Linguistics and Chinese Language Proceedings Vol. 9 No. 1, 2004.         [ Links ]

[22] R. Morales, "Resolución automática de la anáfora indirecta en español", Ph. D. Thesis. CIC-IPN. Mexico. In Spanish, 2004.         [ Links ]

[23] R. Mitkov, "Anaphora resolution: the state of the art", Working paper (Based on the COLING'98/ACL,98 tutorial on anaphora resolution), University of Wolverhampton, U.K. 1999.         [ Links ]

[24] X. Carreras, et al., "FreeLing: An Open-Source Suite of Language Analyzers", LREC'04 Proceedings of the 4th International Conference on Language Resources and Evaluation. Vol. 4, Lisbon, Portugal, 2004.         [ Links ]

[25] Online source 1 Piano description Archive. Available from: (online): http://es.wikipedia.org/wiki/Piano        [ Links ]

[26] Online source 2 Burro flautista description Archive. Available from: (online): http://www.juegosyeducacion.com/fabulas/el_burro_flautista.html        [ Links ]

[27] Online source 3 Batalla de Puebla description Archive. Available from: (online): http://www.educar.org/ comun/efemerides/Mexico/5dem ayomexico        [ Links ]

[28] Online source 4 Francisco Villa description Archive. Available from: (online): http://www.biografiasyvidas.com/biografia/v/villa.htm        [ Links ]

[29] Online source 5 Mario Almada description Archive. Available from: (online): http://es.wikipedia.org/wiki/ Mario Almada        [ Links ]

[30] Online source 6 Spanish WordNet 3.0 description Archive. Available from: (online): http://sinai.ujaen.es/timm/wiki/index.php/Spanish_WordNet_3.0        [ Links ]

[31] Online source 7 TNT -- Statistical Part-of-Speech Tagging. Thorsten Brants. Available from: (online): ww.coli.uni-saarland.de/~thorsten/tnt/        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License