Recuperación de Información con Resolución de Ambigüedad de Sentidos de Palabras para el Español

Ledo Mezquita, Yoel; Sidorov, Grigori; Gelbukh, Alexander

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.11 no.3 Ciudad de México ene./mar. 2008

Resumen de tesis doctoral

Recuperación de Información con Resolución de Ambigüedad de Sentidos de Palabras para el Español

Information Retrieval with Word Sense Disambiguation for Spanish

Graduated: Yoel Ledo Mezquita
Centro de Investigación en Computación (CIC–IPN)
Av. Juan de Dios Bátiz sn esq. Miguel Othón de Mendizábal
C. P. 07738 México D. F.,
e–mail: yledo@yahoo.com

Advisor: Grigori Sidorov
Centro de Investigación en Computación (CIC–IPN)
Av. Juan de Dios Bátiz sn esq. Miguel Othón de Mendizábal
C. P. 07738 México D. F.,
www.cic.ipn.mx/~sidorov

Advisor: Alexander Gelbukh
Centro de Investigación en Computación (CIC–IPN)
Av. Juan de Dios Bátiz sn esq. Miguel Othón de Mendizábal
C. P. 07738 México D. F.,
www.cic.ipn.mx/~sidorov

Graduated on June 23, 2006

Resumen

Uno de los problemas en los portales de recuperación de información en Internet (los portales dinámicos de Altavista, Google, Yahoo, etc.) y en bibliotecas digitales (Biblioteca del Congreso de los EE.UU., etc.) es el de brindar diversas respuestas con muy baja pertinencia. Por ejemplo, un mecánico de autos busca "¿dónde comprar un gato?" y obtiene respuestas sobre los "gatos monteses", "gatos siameses", y otros. Un comerciante de frutas busca "producción de lima" y obtiene respuestas sobre la "ciudad de Lima", "jugo de lima", "lima de uñas", y otros. Estas imprecisiones son debidas a los distintos sentidos que tienen las palabras, lo cual se le conoce como Desambiguación del Sentido de las Palabras (Word Sense Disambiguation, WSD, del inglés.) Este término, es un mecanismo lingüístico para definir el sentido correcto de una palabra, basándose en el contexto donde se emplee, en función de sus posibles sentidos semánticos. Las aportaciones de este artículo consisten en el desarrollo de un nuevo método de desambiguación de sentidos de palabras usando grandes recursos léxicos (diccionarios explicativos, diccionarios de sinónimos, WordNet).

Palabras clave: sentidos de palabras, contexto, diccionarios, algoritmo de Lesk.

Abstract

One of the problems of information retrieval in Internet and digital libraries is low precision: a high number of retrieved documents of low relevance. For example, a person looks for information about jaguars (the animal) and the documents retrieved are about the model of a car. This problem arises due to ambiguity of different senses of words. The task of determining the correct interpretation of a word in its context is known as Word Sense Disambiguation (WSD) task. It employs a linguistic mechanism that detects the most suitable sense of a word, according to the context where the word is used, choosing of its possible senses. In this paper, a new method for word senses disambiguation is proposed based on additional linguistic information for the words in the context available from the large lexical resources, like explanatory dictinary, synonym dictionary, WordNet.

Keywords: word senses, context, dictionaries, Lesk algorithm.

DESCARGAR ARTÍCULO EN FORMATO PDF

Referencias

1. Aguirre, E. and G. Rigau (1996). Word Sense Disambiguation using Conceptual Density. Proc. 16th international conference on COLING. Copenhangen. [ Links ]

2. Baeza–Yates, R. and B. Ribeiro–Neto (1999). Modern Information Retrieval. Addison–Wesley. [ Links ]

3. Bolshakov, I. and A. Gelbukh (2004). Computational Linguistics: Models, Resources, Applications. IPN – UNAM – Fondo de Cultura Económica, Mexico, 186 p. [ Links ]

4. Campos, L. M. de (2001). Un modelo de recuperación de información basado en redes bayesianas. Universidad de Granada, España. [ Links ]

5. Dolan, W., L. Vanderwende, and S. Richardson (2000). Polysemy in a Broad–Coverage Natural Language Processing System. In Polysemy: Theoretical and Computational Approaches. Ravin Yael and Leacock Claudia (ed.). Oxford University Press. New York. 178–204. [ Links ]

6. Ghazfan, (1996). Toward meaningful Bayesian networks for information retrieval systems. In Proceedings of the IPMU'96 Conference, pages 841–846. [ Links ]

7. Lesk, M. (1986). Automated Sense Disambiguation Using Machine–readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In: Proceedings of the 1986 SIGDOC Conference, Toronto, Canada, June 1986, 24–26. [ Links ]

8. Manning, C. and H. Schütze (1999) Foundations of Statistical Natural Language Processing. MIT Press. [ Links ]

9. Cowie, J., L. Guthrie, and G. Guthrie (1992). Lexical disambiguation using simulated annealing. Proceedings of Coling–92, Nante, France, pp. 359–365. [ Links ]

10. Global Reach (2002). http://global–reach.biz [ Links ]

11. McHale, M. L. (1997). A comparison of WordNet and Roget's taxonomy for measuring semantic similarity. [ Links ]

12. Lawrence, S. (2000). El Acceso a la Información en la Web Limitado y Desigual. NEC Research Institute, http://www.neci.nec.com/ [ Links ]

13. McRoy, S. (1992). Using multiple knowledge sources for word sense disambiguation. Computational Linguistics, Vol. 18(1), pp. 1–30. [ Links ]

14. Mihalcea, R. and D. Moldovan (1999). A Method for word sense disambiguation of unrestricted text. Proc 37th Annual Meeting of the ACL 152–158, Maryland, USA. [ Links ]

15. Montoyo, A. (2001). Método badaso en Marcas de Especificidad para WSD, Grupo de Procesamiento del Lenguaje y Sistemas de Información. Universidad de Alicante, España. [ Links ]

16. Ravin, Ya. and C. Leacock (2000). Polysemy: an overview. In Polysemy: Theoretical and Computational Approaches. Ravin Yael and Leacock Claudia (ed.). Oxford University Press. New York. 1–29 [ Links ]

17. Pimienta, D. (2000). Representación de las lenguas y culturas latinas en la Internet, Fundación Redes y Desarrollo. Encuentro Sociedad y Tecnología, Santiago de Chile. [ Links ]

18. Resnik, Ph. (1995). Disambiguating noun groupings with respect to WordNet senses. Proc. Third Workshop on Very Large Corpora. 54–68. Cambridge, MA [ Links ]

19. Resnik, Ph. (1999). Semantic similarity in a taxonomy: an information–based measure and its application to problems of ambiguity in natural language. In Journal of Artificial Intelligence Research 11. 95–130. [ Links ]

20. Ribeiro, B. (1996). A belief network model for IR. In Proceedings of the 19th Annual International ACM–SIGIR Conference on Research and Development in Information Retrieval. SIGIR'96, August 18–22, 1996, Zurich, pages 253–260. ACM [ Links ]

21. Rigau, G., J. Atserias and E. Aguirre (1997). Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation. Proc 35th annual Meeting of the ACL, 48–55, Madrid, Spain. [ Links ]

22. Saracevic, T. (1995). A taxonomy of values for library and information services. Rutgers University, New Brunswick. [ Links ]

23. Stetina J., S. Kurohashi and M. Nagao (1998.) General word sense disambiguation method based on full sentencial context. In Usage of WordNet in Natural Language Processing. COLING–ACL Workshop, Montreal, Canada. [ Links ]

24. Sussna, M. (1993). Word sense disambiguation for free–text indexing using a massive semantic network. Proc. Second International CIKM, 67–74, Airlington. [ Links ]

25. Turtle, and Croft (1990). Inference networks for document retrieval. In SIGIR'90, 13th International ACM–SIGIR Conference on Research and Development in Information Retrieval, Brussels, Belgium, 5–7 September 1990, Proceedings, pages 1–24. ACM, 1990. [ Links ]

26. Voorhees, E. M. (1993). Using WordNet to disambiguate word senses for text retrieval. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 27 June–1 July 1993, Pittsburgh, Pennsylvania, 171–180. [ Links ]

27. Wiks, Y., D. Fass, C. Guo, J. McDonal, T. Plate and B. Slator (1993). Providing Machine Tractable dictionary tools. In: Semantics and the lexicon (J. Pustejowsky, Ed.), pp. 341–401 [ Links ]

28. Wilks, Y. and M. Stevenson (1996). The grammar of sense: Is word sense tagging much more than part– of–speech tagging? Technical Report CS–96–05, University of Sheffield, Sheffield, United Kingdom. [ Links ]

29. Wilks, Y. and M. Stevenson. The grammar of sense: Is word–sense tagging much more than part–of–speech tagging? Technical Report CS–96–05, University of Sheffield, 1996. [ Links ]

30. Wilks, Y. and M. Stevenson (1998), Word sense disambiguation using optimized combination of knowledge sources. Proceedings of ACL 36/Coling 17, 1398–1402. [ Links ]

31. WordNet: an electronic lexical database. (1998), C. Fellbaum (ed.), MIT, 423 p. [ Links ]

32. Yarowksy, D. (1992) Word–sense disambiguation using statistical models of Roget's categories trained on large corpora. Proceeding of Coling–92, Nante, France, pp. 454–460. [ Links ]