versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
Low Cost Construction of a Multilingual Lexicon from Bilingual Lists
Lian Tze Lim*, Bali RanaivoMalançon**, and Enya Kong Tang***
Manuscript received November 2, 2010.
Manuscript accepted for publication January 22, 2011.
Manually constructing multilingual translation lexicons can be very costly, both in terms of time and human effort. Although there have been many efforts at (semi)automatically merging bilingual machine readable dictionaries to produce a multilingual lexicon, most of these approaches place quite specific requirements on the input bilingual resources. Unfortunately, not all bilingual dictionaries fulfil these criteria, especially in the case of underresourced language pairs. We describe a low cost method for constructing a multilingual lexicon using only simple lists of bilingual translation mappings. The method is especially suitable for underresourced language pairs, as such bilingual resources are often freely available and easily obtainable from the Internet, or digitised from simple, conventional paperbased dictionaries. The precision of random samples of the resultant multilingual lexicon is around 0.700.82, while coverage for each language, precision and recall can be controlled by varying threshold values. Given the very simple input resources, our results are encouraging, especially in incorporating underresourced languages into multilingual lexical resources.
Key words: Lexical resources, multilingual lexicon, underresourced languages.
The work reported in this paper is supported by a Fundamental Research Grant (FRGS/1/10/TK/MMU/02/02) from the Malaysian Ministry of Higher Education. We thank the evaluators who participated in the results evaluation, and the two anonymous reviewers for their comments on improving this paper.
 M. Lafourcade, "Automatically populating acception lexical database through bilingual dictionaries and conceptual vectors," in Proceedings of PAPILLON2002, Tokyo, Japan, 8 2002. [ Links ]
 D. Tufis, D. Cristeau, and S. Stamou, "BalkaNet: Aims, methods, results and perspectives a general overview," Romanian Journal of Information Science and Technology Special Issue, vol. 7, no. 1, pp. 943, 2004. [ Links ]
 P. Vossen, "EuroWordNet: A multilingual database of autonomous and languagespecific wordnets connected via an InterLingualIndex," Special Issue on Multilingual Databases, International Journal of Linguistics, vol. 17, no. 2, 2004. [ Links ]
 M. Sammer and S. Soderland, "Building a sensedistinguished multilingual lexicon from monolingual corpora and bilingual lexicons," in Proceedings of Machine Translation Summit XI, Copenhagen, Denmark, 2007, pp. 399406. [ Links ]
 C. Fellbaum, Ed., WordNet: An Electronic Lexical Database, ser. Language, Speech, and Communication. Cambridge, Massachusetts: MIT Press, 1998. [ Links ]
 G. Francopoulo, N. Bel, M. George, N. Calzolari, M. Monachini, M. Pet, and C. Soria, "Multilingual resources for NLP in the lexical markup framework (LMF)," Language Resources and Evaluation, vol. 43, no. 1, pp. 5770, 3 2009. [ Links ]
 C. Boitet, M. Mangeot, and G. Serasset, "The PAPILLON project: Cooperatively building a multilingual lexical database to derive open source dictionaries & lexicons," in Proceedings of the 2nd Workshop on NLP and XML (NLPXML'02), 2002, pp. 13. [ Links ]
 K. Tanaka, K. Umemura, and H. Iwasaki, "Construction of a bilingual dictionary intermediated by a third language," Transactions of the Information Processing Societty of Japan, vol. 39, no. 6, pp. 19151924, 1998, in Japanese. [ Links ]
 F. Bond and K. Ogura, "Combining linguistic resources to create a machinetractable JapaneseMalay dictionary," Language Resources and Evaluation, vol. 42, pp. 127136, 2008. [ Links ]
 F. Bond, b. S. Ruhaida, T. Yamazaki, and K. Ogura, "Design and construction of a machinetractable JapaneseMalay dictionary." in Proceedings of MT Summit VIII, Santiago de Compostela, Spain, 2001, pp. 5358. [ Links ]
 W. M. Rand, "Objective criteria for the evaluation of clustering methods," Journal of the American Statistical Association, vol. 66, no. 336, pp. 846850, 1971. [ Links ]
 K. Markó, S. Schulz, and U. Hahn, "Multilingual lexical acquisition by bootstrapping cognate seed lexicons," in Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) 2005, Borovets, Bulgaria, 2005. [ Links ]