versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
Examining the Validity of CrossLingual Word Sense Disambiguation
Els Lefever* and Veronique Hoste**
LT3, University College Ghent, GrootBrittannielaan 45, Ghent, Belgium and Dpt. of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281(S9), Ghent, Belgium (email: *Els.Lefever@hogent.be, **Veronique.Hoste@hogent.be).
Manuscript received November 6, 2010.
Manuscript accepted for publication January 12, 2011.
This paper describes a set of experiments in which the viability of a classificationbased Word Sense Disambiguation system that uses evidence from multiple languages is investigated. Instead of using a predefined monolingual senseinventory such as WordNet, we use a languageindependent framework and start from a manually constructed gold standard in which the word senses are made up by the translations that result from word alignments on a parallel corpus. To train and test the classifier, we used English as an input language and we incorporated the translations of our target words in five languages (viz. Spanish, Italian, French, Dutch and German) as features in the feature vectors. Our results show that the multilingual approach outperforms the classification experiments where no additional evidence from other languages is used. These results confirm our initial hypothesis that each language adds evidence to further refine the senses of a given word. This allows us to develop a proof of concept for a multilingual approach to Word Sense Disambiguation.
Key words: Word Sense Disambiguation, multilingual, crosslingual.
 E. Agirre and P. Edmonds, Eds., Word Sense Disambiguation, ser. Text, Speech and Language Technology. Dordrecht: Springer, 2006. [ Links ]
 R. Navigli, "Word sense disambiguation: a survey," in ACM Computing Surveys, 2009, vol. 41, no. 2, pp. 169. [ Links ]
 C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998. [ Links ]
 A. Otegi, E. Agirre, and G. Rigau, "Ixa at clef 2008 robustwsd task: Using word sense disambiguation for (cross lingual) information retrieval," in Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the CrossLanguage Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 1719, 2008., 2009. [ Links ]
 P. Koehn, "Europarl: A parallel corpus for statistical machine translation," in Proceedings of the MT Summit, 2005. [ Links ]
 P. Resnik and D. Yarowsky, "Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation," Natural Language Engineering, vol. 5, no. 3, pp. 113133, 2000. [ Links ]
 N. Ide, T. Erjavec, and D. Tufis, "Sense discrimination with parallel corpora," in Proceedings of ACL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, PA, 2002, pp. 5460. [ Links ]
 W. Gale, K. Church, and D. Yarowsky, "A method for disambiguating word senses in a large corpus," in Computers and the Humanities, 1993, vol. 26, pp. 415439. [ Links ]
 H. Ng, B. Wang, and Y. Chan, "Exploiting parallel texts for word sense disambiguation: An empirical study," in Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 2003, pp. 455462. [ Links ]
 M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora," in Proceedings of ACL, 2002, pp. 255262. [ Links ]
 D. Tufis, R. Ion, and N. Ide, "FineGrained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets," in Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Geneva, Switzerland: Association for Computational Linguistics, Aug. 2004, pp. 13121318. [ Links ]
 Y. Chan and H. Ng, "Scaling up word sense disambiguation via parallel texts," in AAAI'05: Proceedings of the 20th national conference on Artificial intelligence. AAAI Press, 2005, pp. 10371042. [ Links ]
 W. Gale and K. Church, "A program for aligning sentences in bilingual corpora," in Computational Linguistics, 1991, pp. 177184. [ Links ]
 F. Och and H. Ney, "A systematic comparison of various statistical alignment models," Computational Linguistics, vol. 29, no. 1, pp. 1951, 2003. [ Links ]
 W. Daelemans and A. van den Bosch, MemoryBased Language Processing. Cambridge University Press, 2005. [ Links ]
 H. Schutze, "Automatic word sense discrimination," Computational Linguistics, vol. 24, no. 1, pp. 97123, 1998. [ Links ]
 A. Purandare and T. Pedersen, "Word sense discrimination by clustering contexts in vector and similarity spaces," in Proceedings of the Conference on Computational Natural Language Learning, 2004, pp. 4148. [ Links ]
 N. Ide, "Parallel translations as sense discriminators," in SIGLEX Workshop On Standardizing Lexical Resources, 1999. [ Links ]
 W. Daelemans, J. Zavrel, and K. v. d. B. van der Sloot, "Timbl: Tilburg memorybased learner, version 4.3, reference guide," Tilburg University, Tech. Rep. ILK Technical Report ILK 0210, 2002. [ Links ]
 V. Hoste, I. Hendrickx, W. Daelemans, and A. van den Bosch, "Parameter optimization for machinelearning of word sense disambiguation," Natural Language Engineering, Special Issue on Word Sense Disambiguation Systems, vol. 8, pp. 311325, 2002. [ Links ]
 J. Quinlan, C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1993. [ Links ]
 W. Daelemans, V. Hoste, F. De Meulder, and B. Naudts, "Combined optimization of feature selection and algorithm parameter interaction in machine learning of language," in Proceedings of the 14th European Conference on Machine Learning (ECML2003), 2003, pp. 8495. [ Links ]