SciELO - Scientific Electronic Library Online

 
 número41A Natural Language Dialogue System for Impression-based Music RetrievalAnálisis de Opiniones con Ontologías índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Polibits

versión On-line ISSN 1870-9044

Polibits  no.41 México ene./jun. 2010

 

Special section: processing of semantic information

 

Retrieving Lexical Semantics from Multilingual Corpora

 

Ahmad R. Shahid and Dimitar Kazakov

 

Department of Computer Science, University of York, YO10 5DD, UK. (ahmad@cs.york.ac.uk; kazakov@cs.york.ac.uk).

 

Manuscript received March 24, 2010.
Manuscript accepted for publication June 14, 2010.

 

Abstract

This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual 'synsets'. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for English, German, French, and Greek using the Europarl parallel corpus.

Key words: Multilingual coropora, lexical realtions.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCES

[1] D. B. Lenat, "Cyc: A large–scale investment in knowledge infrastructure," Communications of the ACM, vol. 38, no. 11, pp. 33–38, 1995.         [ Links ]

[2] G. A. Miller, "Five papers on wordnet," Special Issue of International Journal of Lexicogrphy, vol. 3, no. 4, 1990.         [ Links ]

[3] P. Vossen, Ed., Eurowordnet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, 1998.         [ Links ]

[4] M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002, pp. 255–262.         [ Links ]

[5] D. Kazakov and A. R. Shahid, "Unsupervised construction of a multilingual wordnet from parallel corpora," in Workshop on Natural Language Processing methods and Corpora in Translation, Lexicography, and Language Learning, RANLP, 2009.         [ Links ]

[6] E. Lefever and V. Hoste, "Semeval–2010 task 3: Cross–lingual word sense disambiguation," in Proceedings ofthe Workshop on Semantic Evaluations: Recent Achievements and Future Directions, 2009.         [ Links ]

[7] R. Bruce and J. Wiebe, "Word–sense disambiguation using decomposable models," in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL), 1994, pp. 139–146.         [ Links ]

[8] Y. K. Lee and H. T. Ng, "An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation," in Proceedings ofthe ACL–02 Conference on Empirical Methods in Natural Language Processing – Volume 10, 2002, pp. 41—48.         [ Links ]

[9] E. Brill, "A simple rule–based part of speech tagger," in Proceedings of the Third Conference on Applied Natural Language Processing, 1992, pp. 152–155.         [ Links ]

[10] D. Gusfield, Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge, UK, 1997.         [ Links ]

[11] J. B. Kruskal, "An overview of sequence comparison: Time warps, string edits, and macromolecules," SIAM Review, vol. 25, no. 2, pp. 201–237, 1983.         [ Links ]

[12] V. I. Levenstein, "Binary codes capable of correcting, insertions and reversals," Sov. Phys. Dokl., vol. 10, pp. 707–710, 1966.         [ Links ]

[13] A. D. Cruse, Lexical Semantics. Cambridge University Press, Cambridge, UK, 1986.         [ Links ]

[14] P. Edmonds and G. Hirst, "Near–synonymy and lexical choice," Computational Linguistics, vol. 28, no. 2, pp. 105–145, 2002.         [ Links ]

[15] L. van der Plas and J. Tiedemann, "Finding synonyms using automatic word alignment and measures of distributional similarity," in Proceedings of ACL/COLING 2006, 2006.         [ Links ]

[16] P. D. Turney, "Mining the web for synonyms: Pmi–ir versus lsa on toefl," in Proceedings of the Twelfth European Conference on Machine Learning, 2001, pp. 491–502.         [ Links ]

[17] K. W. Church and P. Hanks, "Word association norms, mutual information and lexicography," in Proceedings of the 27th Annual Meeting ofthe Association of Computational Linguistics (ACL), 1989, pp. 76–83.         [ Links ]

[18] K. W. Church, W. Gale, P. Hanks, and D. Hindle, Using Statistics in Lexical Analysis. Lawrence Erlbaum, 1991, ch. In Lexical Acquisition: Using On–Line Resources to Build a Lexicon, edited by Uri Zernik, pp. 115–164.         [ Links ]

[19] C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999.         [ Links ]