SciELO - Scientific Electronic Library Online

 
 número43Semantic Textual Entailment Recognition using UNLKnowledge Expansion of a Statistical Machine Translation System using Morphological Resources índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Polibits

versión On-line ISSN 1870-9044

Polibits  no.43 México ene./jun. 2011

 

Examining the Validity of Cross–Lingual Word Sense Disambiguation

 

Els Lefever* and Veronique Hoste**

 

LT3, University College Ghent, Groot–Brittannielaan 45, Ghent, Belgium and Dpt. of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281(S9), Ghent, Belgium (e–mail: *Els.Lefever@hogent.be, **Veronique.Hoste@hogent.be).

 

Manuscript received November 6, 2010.
Manuscript accepted for publication January 12, 2011.

 

Abstract

This paper describes a set of experiments in which the viability of a classification–based Word Sense Disambiguation system that uses evidence from multiple languages is investigated. Instead of using a predefined monolingual sense–inventory such as WordNet, we use a language–independent framework and start from a manually constructed gold standard in which the word senses are made up by the translations that result from word alignments on a parallel corpus. To train and test the classifier, we used English as an input language and we incorporated the translations of our target words in five languages (viz. Spanish, Italian, French, Dutch and German) as features in the feature vectors. Our results show that the multilingual approach outperforms the classification experiments where no additional evidence from other languages is used. These results confirm our initial hypothesis that each language adds evidence to further refine the senses of a given word. This allows us to develop a proof of concept for a multilingual approach to Word Sense Disambiguation.

Key words: Word Sense Disambiguation, multilingual, cross–lingual.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCES

[1] E. Agirre and P. Edmonds, Eds., Word Sense Disambiguation, ser. Text, Speech and Language Technology. Dordrecht: Springer, 2006.         [ Links ]

[2] R. Navigli, "Word sense disambiguation: a survey," in ACM Computing Surveys, 2009, vol. 41, no. 2, pp. 1–69.         [ Links ]

[3] C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998.         [ Links ]

[4] A. Otegi, E. Agirre, and G. Rigau, "Ixa at clef 2008 robust–wsd task: Using word sense disambiguation for (cross lingual) information retrieval," in Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the Cross–Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17–19, 2008., 2009.         [ Links ]

[5] P. Koehn, "Europarl: A parallel corpus for statistical machine translation," in Proceedings of the MT Summit, 2005.         [ Links ]

[6] P. Resnik and D. Yarowsky, "Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation," Natural Language Engineering, vol. 5, no. 3, pp. 113–133, 2000.         [ Links ]

[7] N. Ide, T. Erjavec, and D. Tufis, "Sense discrimination with parallel corpora," in Proceedings of ACL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, PA, 2002, pp. 54–60.         [ Links ]

[8] W. Gale, K. Church, and D. Yarowsky, "A method for disambiguating word senses in a large corpus," in Computers and the Humanities, 1993, vol. 26, pp. 415–439.         [ Links ]

[9] H. Ng, B. Wang, and Y. Chan, "Exploiting parallel texts for word sense disambiguation: An empirical study," in Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 2003, pp. 455–462.         [ Links ]

[10] M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora," in Proceedings of ACL, 2002, pp. 255–262.         [ Links ]

[11] D. Tufis, R. Ion, and N. Ide, "Fine–Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets," in Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Geneva, Switzerland: Association for Computational Linguistics, Aug. 2004, pp. 1312–1318.         [ Links ]

[12] Y. Chan and H. Ng, "Scaling up word sense disambiguation via parallel texts," in AAAI'05: Proceedings of the 20th national conference on Artificial intelligence. AAAI Press, 2005, pp. 1037–1042.         [ Links ]

[13] W. Gale and K. Church, "A program for aligning sentences in bilingual corpora," in Computational Linguistics, 1991, pp. 177–184.         [ Links ]

[14] F. Och and H. Ney, "A systematic comparison of various statistical alignment models," Computational Linguistics, vol. 29, no. 1, pp. 19–51, 2003.         [ Links ]

[15] W. Daelemans and A. van den Bosch, Memory–Based Language Processing. Cambridge University Press, 2005.         [ Links ]

[16] H. Schutze, "Automatic word sense discrimination," Computational Linguistics, vol. 24, no. 1, pp. 97–123, 1998.         [ Links ]

[17] A. Purandare and T. Pedersen, "Word sense discrimination by clustering contexts in vector and similarity spaces," in Proceedings of the Conference on Computational Natural Language Learning, 2004, pp. 41–48.         [ Links ]

[18] N. Ide, "Parallel translations as sense discriminators," in SIGLEX Workshop On Standardizing Lexical Resources, 1999.         [ Links ]

[19] W. Daelemans, J. Zavrel, and K. v. d. B. van der Sloot, "Timbl: Tilburg memory–based learner, version 4.3, reference guide," Tilburg University, Tech. Rep. ILK Technical Report – ILK 02–10, 2002.         [ Links ]

[20] V. Hoste, I. Hendrickx, W. Daelemans, and A. van den Bosch, "Parameter optimization for machine–learning of word sense disambiguation," Natural Language Engineering, Special Issue on Word Sense Disambiguation Systems, vol. 8, pp. 311–325, 2002.         [ Links ]

[21] J. Quinlan, C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1993.         [ Links ]

[22] W. Daelemans, V. Hoste, F. De Meulder, and B. Naudts, "Combined optimization of feature selection and algorithm parameter interaction in machine learning of language," in Proceedings of the 14th European Conference on Machine Learning (ECML–2003), 2003, pp. 84–95.         [ Links ]