SciELO - Scientific Electronic Library Online

 
 número43Knowledge Expansion of a Statistical Machine Translation System using Morphological ResourcesA Cross-Lingual Pattern Retrieval Framework índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Polibits

versión On-line ISSN 1870-9044

Polibits  no.43 México ene./jun. 2011

 

Low Cost Construction of a Multilingual Lexicon from Bilingual Lists

 

Lian Tze Lim*, Bali Ranaivo–Malançon**, and Enya Kong Tang***

 

Natural Language Processing Special Interest Group, Faculty of Information Technology, Multimedia University, Malaysia (e–mail: *liantze@gmail.com, **ranaivo@mmu.edu.my, ***enyakong@mmu.edu.my).

 

Manuscript received November 2, 2010.
Manuscript accepted for publication January 22, 2011.

 

Abstract

Manually constructing multilingual translation lexicons can be very costly, both in terms of time and human effort. Although there have been many efforts at (semi–)automatically merging bilingual machine readable dictionaries to produce a multilingual lexicon, most of these approaches place quite specific requirements on the input bilingual resources. Unfortunately, not all bilingual dictionaries fulfil these criteria, especially in the case of under–resourced language pairs. We describe a low cost method for constructing a multilingual lexicon using only simple lists of bilingual translation mappings. The method is especially suitable for under–resourced language pairs, as such bilingual resources are often freely available and easily obtainable from the Internet, or digitised from simple, conventional paper–based dictionaries. The precision of random samples of the resultant multilingual lexicon is around 0.70–0.82, while coverage for each language, precision and recall can be controlled by varying threshold values. Given the very simple input resources, our results are encouraging, especially in incorporating under–resourced languages into multilingual lexical resources.

Key words: Lexical resources, multilingual lexicon, under–resourced languages.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

ACKNOWLEDGMENT

The work reported in this paper is supported by a Fundamental Research Grant (FRGS/1/10/TK/MMU/02/02) from the Malaysian Ministry of Higher Education. We thank the evaluators who participated in the results evaluation, and the two anonymous reviewers for their comments on improving this paper.

 

REFERENCES

[1] M. Lafourcade, "Automatically populating acception lexical database through bilingual dictionaries and conceptual vectors," in Proceedings of PAPILLON–2002, Tokyo, Japan, 8 2002.         [ Links ]

[2] D. Tufis, D. Cristeau, and S. Stamou, "BalkaNet: Aims, methods, results and perspectives – a general overview," Romanian Journal of Information Science and Technology Special Issue, vol. 7, no. 1, pp. 9–43, 2004.         [ Links ]

[3] P. Vossen, "EuroWordNet: A multilingual database of autonomous and language–specific wordnets connected via an Inter–Lingual–Index," Special Issue on Multilingual Databases, International Journal of Linguistics, vol. 17, no. 2, 2004.         [ Links ]

[4] M. Sammer and S. Soderland, "Building a sense–distinguished multilingual lexicon from monolingual corpora and bilingual lexicons," in Proceedings of Machine Translation Summit XI, Copenhagen, Denmark, 2007, pp. 399–406.         [ Links ]

[5] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database, ser. Language, Speech, and Communication. Cambridge, Massachusetts: MIT Press, 1998.         [ Links ]

[6] G. Francopoulo, N. Bel, M. George, N. Calzolari, M. Monachini, M. Pet, and C. Soria, "Multilingual resources for NLP in the lexical markup framework (LMF)," Language Resources and Evaluation, vol. 43, no. 1, pp. 57–70, 3 2009.         [ Links ]

[7] C. Boitet, M. Mangeot, and G. Serasset, "The PAPILLON project: Cooperatively building a multilingual lexical database to derive open source dictionaries & lexicons," in Proceedings of the 2nd Workshop on NLP and XML (NLPXML'02), 2002, pp. 1–3.         [ Links ]

[8] K. Tanaka, K. Umemura, and H. Iwasaki, "Construction of a bilingual dictionary intermediated by a third language," Transactions of the Information Processing Societty of Japan, vol. 39, no. 6, pp. 1915–1924, 1998, in Japanese.         [ Links ]

[9] F. Bond and K. Ogura, "Combining linguistic resources to create a machine–tractable Japanese–Malay dictionary," Language Resources and Evaluation, vol. 42, pp. 127–136, 2008.         [ Links ]

[10] F. Bond, b. S. Ruhaida, T. Yamazaki, and K. Ogura, "Design and construction of a machine–tractable Japanese–Malay dictionary." in Proceedings of MT Summit VIII, Santiago de Compostela, Spain, 2001, pp. 53–58.         [ Links ]

[11] W. M. Rand, "Objective criteria for the evaluation of clustering methods," Journal of the American Statistical Association, vol. 66, no. 336, pp. 846–850, 1971.         [ Links ]

[12] K. Markó, S. Schulz, and U. Hahn, "Multilingual lexical acquisition by bootstrapping cognate seed lexicons," in Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) 2005, Borovets, Bulgaria, 2005.         [ Links ]