SciELO - Scientific Electronic Library Online

 
vol.20 número3Using Word Embeddings for Query Translation for Hindi to English Cross Language Information RetrievalA Novel Multimodal Deep Neural Network Framework for Extending Knowledge Base índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

JAKUBINA, Laurent  e  LANGLAIS, Philippe. A Comparison of Methods for Identifying the Translation of Words in a Comparable Corpus: Recipes and Limits. Comp. y Sist. [online]. 2016, vol.20, n.3, pp.449-458. ISSN 2007-9737.  https://doi.org/10.13053/cys-20-3-2465.

Identifying translations in comparable corpora is a challenge that has attracted many researchers since a long time. It has applications in several applications including Machine Translation and Cross-lingual Information Retrieval. In this study we compare three state-of-the-art approaches for these tasks: the so-called context-based projection method, the projection of monolingual word embeddings, as well as a method dedicated to identify translations of rare words. We carefully explore the hyper-parameters of each method and measure their impact on the task of identifying the translation of English words in Wikipedia into French. Contrary to the standard practice, we designed a test case where we do not resort to heuristics in order to pre-select the target vocabulary among which to find translations, therefore pushing each method to its limit. We show that all the approaches we tested have a clear bias toward frequent words. In fact, the best approach we tested could identify the translation of a third of a set of frequent test words, while it could only translate around 10% of rare words.

Palavras-chave : Comparable corpora; bilingual lexicon induction; distributional approaches; rare word translation.

        · texto em Inglês     · Inglês ( pdf )