SciELO - Scientific Electronic Library Online

 
vol.14 número2EditorialGeneración y optimización de controladores difusos utilizando el modelo NEFCON índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.14 no.2 Ciudad de México oct./dic. 2010

 

Artículos

 

Un método independiente del idioma para responder preguntas de definición

 

An Independent Language Method for Answer Definition Questions

 

Claudia Denicia Carral, Luis Villaseñor Pineda, Manuel Montes y Gómez

 

Laboratorio de Tecnologías del Lenguaje, Coordinación de Ciencias Computacionales, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). Tonatzintla, Puebla, México. E–mail: cdenicia@inaoep.mx, villasen@inaoep.mx, mmontesg@inaoep.mx

 

Artículo recibido en Junio 6, 2007.
Aceptado en Abril 17, 2009.

 

Resumen

Este trabajo describe un método para responder preguntas de definición basado exclusivamente en patrones léxicos brindando con ello independencia sobre el idioma. El método aplica dos pasos de minería de texto. El primer paso se enfoca en el descubrimiento de un conjunto de patrones léxicos superficiales a partir de ejemplos de definiciones recuperados de la Web. Posteriormente, se usan los patrones descubiertos para extraer una colección de pares concepto–descripción de una colección de documentos dada. El segundo paso de minería se aplica para determinar la respuesta más adecuada para cierta pregunta específica. Los resultados experimentales se obtuvieron con datos del foro CLEF 2005 y 2006 en tareas monolingües para el español, francés e italiano. Dichos resultados demuestran la pertinencia del método alcanzando altas precisiones para los tres idiomas.

Palabras clave: H. Sistemas de Información, H.3 Almacenamiento y Recuperación de Información, H.3.4 Sistemas y Software, Sistemas de Búsqueda de Respuestas, Preguntas de Definición.

 

Abstract

This paper describes a method for answering definition questions that is exclusively based on the use of lexical patterns, and, therefore, that is language independent. This method applies two main text–mining steps. The first step focuses on the discovery of a set of surface lexical patterns from definition examples downloaded from the Web. Subsequently, it uses these patterns to extract a set of concept–description pairs from a given target document collection. The second step applies a text–mining algorithm to determine the most adequate answer to each specific question. Experimental results were obtained using the datasets from the CLEF 2005 and 2006 for the monolingual tasks in Spanish, French and Italian. These results demonstrate the relevance of the method which showed very high precisions for the three languages.

Keywords: H. Information Systems, H.3 Information Storage and Retrieval, H.3.4 Systems and Software, Question–Answering Systems, Definition Questions.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Agradecimientos

Los autores agradecen a Alberto Téllez, Antonio Juárez, Esaú Villatoro y a Manuel Alberto Pérez por su valiosa participación en las tareas de desarrollo del sistema participante en las evaluaciones CLEF 2005 y 2006. Este trabajo fue realizado gracias al apoyo del CONACYT (Proyecto No. Ref. 43990 y la beca 189692) y del SNI–México. Los autores también agradecen a la agencia EFE y al CLEF por los recursos prestados y las tareas de evaluación de este trabajo.

 

Referencias

1. Ahonen–Myka H. (2002). Discovery of Frequent Word Sequences in Text Source. Pattern Detection and Discovery. Lecture Notes in Artificial Intelligence, 2447, 180–189.         [ Links ]

2. Cui H., Kan M. & Chua T. (2004). Unsupervised Learning of Soft Patterns for Generating Definitions from Online News. 13th International Conference on World Wide Web, New York, USA. 90–99.         [ Links ]

3. Cui H. Kan M. & Chua T. (2005). Generic Soft Pattern Models for Definitional Question Answering. 28th Annual International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR 2005), Salvador, Brazil, 384–391.         [ Links ]

4. Denicia–Carral C., Montes–y–Gómez M., Villaseñor–Pineda L. & García–Hernández, R. (2006). A Text Mining Approach for Definition Question Answering. 5th International Conference on Natural Language Processing (FinTal 2006), Lecture Notes in Computer Science, 4139, 76–86.         [ Links ]

5. Fleischman M., Hovy E. & Echihabi A. (2003). Offline Strategies for Online Question Answering: Answering Question Before they are Asked. 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, 1–7.         [ Links ]

6. García–Hernández, R., Martínez–Trinidad, F. & Carrasco–Ochoa, A. (2004). A Fast Algorithm to find All Maximal Frequent Sequences in a Text. 9th Iberoamerican Congress on Pattern Recognition, CIARP 2004. Lecture Notes in Computer Science, 3287, 478–486.         [ Links ]

7. Girju R. (2003). Automatic Detection of Causal Relations for Question Answering. 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, 76–83.         [ Links ]

8. Greenwood M. & Saggion H. (2004). A pattern Based Approach to Answering Factoid, List and Definition Questions. 7th International Conference "Recherche d'Information Assistée par Ordinateur" (RIAO'04), Avignon, France, 232–243        [ Links ]

9. Greisdorf, H. (2003). Relevance thresholds: a multi–stage predictive model of how users evaluate information. Information Processing and Management. 39 (3), 403–423.         [ Links ]

10. Hearst, M. A. (1992). Automatic Acquisition of Hyponyms on Large Text Corpora. International Conference on Computational Linguistics (COLING–92), Nantes, France, 23–28.         [ Links ]

11. Hildebrandt W., Katz B. & Lin J. (2004). Answering Definition Questions Using Multiple Knowledge Sources. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT–NAACL 2004), Boston, USA, 49–56.         [ Links ]

12. Jijkoun V., De Rijke M. & Mur J. (2004). Information Extraction for Question Answering: Improving Recall through Syntactic Patterns. International Conference on Computational Linguistics (COLING 2004). Geneva, Switzerland, 1284–1290.         [ Links ]

13. Katz B., Lin J., Loreto D., Hildebrant, W., Bilotti M., Fernandes A., Marton G. & Mora F. (2003). Integrating Web–based and Corpus–based Techniques for Question Answering. 12th Text REtrieval Conference (TREC–12), Washington, USA, 426–435.         [ Links ]

14. Liaw, S. & Huang, H. (2003). An Investigation of User Attitudes toward Search Engines as an Information Retrieval Tool. Computers in Human Behavior, 19(6), 751765.         [ Links ]

15. Laurent, D., Séguéla, P. & Nègre, S. (2010). Cross Lingual Question Answering using QRISTAL for CLEF 2006. Evaluation of Multilingual and Multi–modal Information Retrieval. Lecture Notes in Computer Science, 4730, 339–350.         [ Links ]

16. Magnini B., Romagnoli S., Vallin A., Herrera J., Peñas A., Peinado V., Verdejo F. & Rijke M. (2004). The Multiple Language Question Answering Track at CLEF 2003. Comparative Evaluation of Multilingual Information Access Systems. Lecture Notes in Computer Science, 3237, 471–486.         [ Links ]

17. Magnini B., Vallin A., Ayache C., Erbach G., Peñas A., Rijke M., Rocha P., Simov K. & Sutcliffe R. (2005). Overview of the CLEF 2004 Multilingual Question Answering Track. Multilingual Information Access for Text, Speech and Images. Lecture Notes in Computer Science, 3491, 371–391.         [ Links ]

18. Magnini, B., Giampiccolo, D., Forner, P., Ayache, C., Jijkoun, V., Osenova, P., Peñas, A., Rocha, P., Sacaleanu, B., & Sutcliffe, R. (2010). Overview of the CLEF 2006 Multilingual Question Answering Track. Evaluation of Multilingual and Multi–modal Information Retrieval. Lecture Notes in Computer Science, 4730, 223256        [ Links ]

19. Montes–y–Gómez, M., Villaseñor–Pineda, L., Pérez–Coutiño, M., Gómez–Soriano, J. M., Sanchis–Arnal, E. & Rosso, P. (2006). A Full Data–Driven System for Multiple Language Question Answering. Accessing Multilingual Information Repositories. Lecture Notes in Computer Science, 4022, 420–428.         [ Links ]

20. Pantel, P., Ravichandran, D. & Hovy, E. (2004). Towards Terascale Knowledge Acquisition. International Conference on Computational Linguistics (COLING–04), Geneva, Switzerland, 771–777.         [ Links ]

21. Peters C. (2005). What happened in CLEF 2004. Multilingual Information Access for Text, Speech and Images. Lecture Notes in Computer Science, 3491, 1–9.         [ Links ]

22. Ravichandran D., Hovy E. (2002). Learning Surface Text Patterns for a Question Answering System. 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, USA, 41–47.         [ Links ]

23. Ravichandran D., Ittycheriah A. & Roukos S. (2003). Automatic Derivation of Surface Text Patterns for a Maximum Entropy Based Question Answering System. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT–NAACL 2003), Edmonton, Canada, 8587.         [ Links ]

24. Roussinov D. & Robles J. (2004). Web Question Answering Through Automatically Learned Patterns. Joint ACM/IEEE Conference on Digital Libraries, Tucson, USA, 347–348.         [ Links ]

25. Saggion H. (2004). Identifying Definitions in Text Collections for Question Answering. 4th International Conference on Language Resources and Evaluation, Lisboa, Portugal, 1927–1930.         [ Links ]

26. Saggion, H. & Gaizauskas, R. (2004). Mining on–line sources for definition knowledge. 17th International FLorida Artificial Intelligence Research Society Conference (FLAIRS 2004), Miami, USA, 61–66.         [ Links ]

27. Soubbotin, M.M. & Soubbotin, S.M. (2001). Patterns of Potential Answer Expressions as Clues to the Right Answer. Tenth Text REtrieval Conference. Gaithersburg, USA, 175–182.         [ Links ]

28. Téllez, A., Juárez, A., Hernández G., Denicia C., Villatoro E., Montes M., & Villaseñor, L. (2008). A Lexical Approach for Spanish Question Answering. Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, 5152, 328331.         [ Links ]

29. Vallin, A., Magnini, B., Giampiccolo, D., Aunimo, L., Ayache, C., Osenova, P., Peñas, A., de Rijke, M., Sacaleanu, B., Santos, D. & Sutcliffe, R. (2006). Overview of the CLEF 2005 Multilingual Question Answering Track. Accessing Multilingual Information Repositories. Lecture Notes in Computer Science, 4022, 307–331.         [ Links ]

30. Vicedo, J.L., Rodríguez, H., Peñas, A. & Massot, M. (2003). Los sistemas de Búsqueda de Respuestas desde una perspectiva actual Procesamiento del Lenguaje Natural, 31, 351–367.         [ Links ]

31. Voorhees E. (1999). The TREC–8 Question Answering Track Report, 8th Text REtrieval Conference (TREC–8), Gaithersburg, USA, 77–82.         [ Links ]

32. Voorhees E. & Dawn T. (1999). The TREC–8 Question Answering Track Evaluation. 8th Text REtrieval Conference (TREC–8), Gaithersburg, USA, 83–105.         [ Links ]

33. Yang, H. & Yoo, Y. (2004). It's All About Attitude: Revisiting the Technology Acceptance Model. Decision Support Systems. 38(1), 19–31.         [ Links ]

34. Wu M., Zheng X., Duan M., Liu T. & Tomek S. (2003). Question Answering By Pattern Matching, Web Proofing, Semantic Form Proofing. 12th Text REtrieval Conference (TREC–12), Washington, USA, 578–586.         [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons