SciELO - Scientific Electronic Library Online

 número41A Natural Language Dialogue System for Impression-based Music RetrievalAnálisis de Opiniones con Ontologías índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados




Links relacionados

  • No hay artículos similaresSimilares en SciELO



versión On-line ISSN 1870-9044

Polibits  no.41 México ene./jun. 2010


Special section: processing of semantic information


Retrieving Lexical Semantics from Multilingual Corpora


Ahmad R. Shahid and Dimitar Kazakov


Department of Computer Science, University of York, YO10 5DD, UK. (;


Manuscript received March 24, 2010.
Manuscript accepted for publication June 14, 2010.



This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual 'synsets'. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for English, German, French, and Greek using the Europarl parallel corpus.

Key words: Multilingual coropora, lexical realtions.





[1] D. B. Lenat, "Cyc: A large–scale investment in knowledge infrastructure," Communications of the ACM, vol. 38, no. 11, pp. 33–38, 1995.         [ Links ]

[2] G. A. Miller, "Five papers on wordnet," Special Issue of International Journal of Lexicogrphy, vol. 3, no. 4, 1990.         [ Links ]

[3] P. Vossen, Ed., Eurowordnet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, 1998.         [ Links ]

[4] M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002, pp. 255–262.         [ Links ]

[5] D. Kazakov and A. R. Shahid, "Unsupervised construction of a multilingual wordnet from parallel corpora," in Workshop on Natural Language Processing methods and Corpora in Translation, Lexicography, and Language Learning, RANLP, 2009.         [ Links ]

[6] E. Lefever and V. Hoste, "Semeval–2010 task 3: Cross–lingual word sense disambiguation," in Proceedings ofthe Workshop on Semantic Evaluations: Recent Achievements and Future Directions, 2009.         [ Links ]

[7] R. Bruce and J. Wiebe, "Word–sense disambiguation using decomposable models," in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL), 1994, pp. 139–146.         [ Links ]

[8] Y. K. Lee and H. T. Ng, "An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation," in Proceedings ofthe ACL–02 Conference on Empirical Methods in Natural Language Processing – Volume 10, 2002, pp. 41—48.         [ Links ]

[9] E. Brill, "A simple rule–based part of speech tagger," in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152–155.         [ Links ]

[10] D. Gusfield, Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge, UK, 1997.         [ Links ]

[11] J. B. Kruskal, "An overview of sequence comparison: Time warps, string edits, and macromolecules," SIAM Review, vol. 25, no. 2, pp. 201–237, 1983.         [ Links ]

[12] V. I. Levenstein, "Binary codes capable of correcting, insertions and reversals," Sov. Phys. Dokl., vol. 10, pp. 707–710, 1966.         [ Links ]

[13] A. D. Cruse, Lexical Semantics. Cambridge University Press, Cambridge, UK, 1986.         [ Links ]

[14] P. Edmonds and G. Hirst, "Near–synonymy and lexical choice," Computational Linguistics, vol. 28, no. 2, pp. 105–145, 2002.         [ Links ]

[15] L. van der Plas and J. Tiedemann, "Finding synonyms using automatic word alignment and measures of distributional similarity," in Proceedings of ACL/COLING 2006, 2006.         [ Links ]

[16] P. D. Turney, "Mining the web for synonyms: Pmi–ir versus lsa on toefl," in Proceedings of the Twelfth European Conference on Machine Learning, 2001, pp. 491–502.         [ Links ]

[17] K. W. Church and P. Hanks, "Word association norms, mutual information and lexicography," in Proceedings of the 27th Annual Meeting ofthe Association of Computational Linguistics (ACL), 1989, pp. 76–83.         [ Links ]

[18] K. W. Church, W. Gale, P. Hanks, and D. Hindle, Using Statistics in Lexical Analysis. Lawrence Erlbaum, 1991, ch. In Lexical Acquisition: Using On–Line Resources to Build a Lexicon, edited by Uri Zernik, pp. 115–164.         [ Links ]

[19] C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999.         [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons