SciELO - Scientific Electronic Library Online

 
vol.20 issue3Using Word Embeddings for Query Translation for Hindi to English Cross Language Information RetrievalA Novel Multimodal Deep Neural Network Framework for Extending Knowledge Base author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Abstract

JAKUBINA, Laurent  and  LANGLAIS, Philippe. A Comparison of Methods for Identifying the Translation of Words in a Comparable Corpus: Recipes and Limits. Comp. y Sist. [online]. 2016, vol.20, n.3, pp.449-458. ISSN 2007-9737.  https://doi.org/10.13053/cys-20-3-2465.

Identifying translations in comparable corpora is a challenge that has attracted many researchers since a long time. It has applications in several applications including Machine Translation and Cross-lingual Information Retrieval. In this study we compare three state-of-the-art approaches for these tasks: the so-called context-based projection method, the projection of monolingual word embeddings, as well as a method dedicated to identify translations of rare words. We carefully explore the hyper-parameters of each method and measure their impact on the task of identifying the translation of English words in Wikipedia into French. Contrary to the standard practice, we designed a test case where we do not resort to heuristics in order to pre-select the target vocabulary among which to find translations, therefore pushing each method to its limit. We show that all the approaches we tested have a clear bias toward frequent words. In fact, the best approach we tested could identify the translation of a third of a set of frequent test words, while it could only translate around 10% of rare words.

Keywords : Comparable corpora; bilingual lexicon induction; distributional approaches; rare word translation.

        · text in English     · English ( pdf )