versión On-line ISSN 1870-9044
Polibits no.41 México ene./jun. 2010
Special section: processing of semantic information
Retrieving Lexical Semantics from Multilingual Corpora
Ahmad R. Shahid and Dimitar Kazakov
Manuscript received March 24, 2010.
Manuscript accepted for publication June 14, 2010.
This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual 'synsets'. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for English, German, French, and Greek using the Europarl parallel corpus.
Key words: Multilingual coropora, lexical realtions.
 D. B. Lenat, "Cyc: A largescale investment in knowledge infrastructure," Communications of the ACM, vol. 38, no. 11, pp. 3338, 1995. [ Links ]
 G. A. Miller, "Five papers on wordnet," Special Issue of International Journal of Lexicogrphy, vol. 3, no. 4, 1990. [ Links ]
 P. Vossen, Ed., Eurowordnet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, 1998. [ Links ]
 M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002, pp. 255262. [ Links ]
 D. Kazakov and A. R. Shahid, "Unsupervised construction of a multilingual wordnet from parallel corpora," in Workshop on Natural Language Processing methods and Corpora in Translation, Lexicography, and Language Learning, RANLP, 2009. [ Links ]
 E. Lefever and V. Hoste, "Semeval2010 task 3: Crosslingual word sense disambiguation," in Proceedings ofthe Workshop on Semantic Evaluations: Recent Achievements and Future Directions, 2009. [ Links ]
 R. Bruce and J. Wiebe, "Wordsense disambiguation using decomposable models," in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL), 1994, pp. 139146. [ Links ]
 Y. K. Lee and H. T. Ng, "An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation," in Proceedings ofthe ACL02 Conference on Empirical Methods in Natural Language Processing Volume 10, 2002, pp. 4148. [ Links ]
 E. Brill, "A simple rulebased part of speech tagger," in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152155. [ Links ]
 D. Gusfield, Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge, UK, 1997. [ Links ]
 J. B. Kruskal, "An overview of sequence comparison: Time warps, string edits, and macromolecules," SIAM Review, vol. 25, no. 2, pp. 201237, 1983. [ Links ]
 V. I. Levenstein, "Binary codes capable of correcting, insertions and reversals," Sov. Phys. Dokl., vol. 10, pp. 707710, 1966. [ Links ]
 A. D. Cruse, Lexical Semantics. Cambridge University Press, Cambridge, UK, 1986. [ Links ]
 P. Edmonds and G. Hirst, "Nearsynonymy and lexical choice," Computational Linguistics, vol. 28, no. 2, pp. 105145, 2002. [ Links ]
 L. van der Plas and J. Tiedemann, "Finding synonyms using automatic word alignment and measures of distributional similarity," in Proceedings of ACL/COLING 2006, 2006. [ Links ]
 P. D. Turney, "Mining the web for synonyms: Pmiir versus lsa on toefl," in Proceedings of the Twelfth European Conference on Machine Learning, 2001, pp. 491502. [ Links ]
 K. W. Church and P. Hanks, "Word association norms, mutual information and lexicography," in Proceedings of the 27th Annual Meeting ofthe Association of Computational Linguistics (ACL), 1989, pp. 7683. [ Links ]
 K. W. Church, W. Gale, P. Hanks, and D. Hindle, Using Statistics in Lexical Analysis. Lawrence Erlbaum, 1991, ch. In Lexical Acquisition: Using OnLine Resources to Build a Lexicon, edited by Uri Zernik, pp. 115164. [ Links ]
 C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999. [ Links ]