SciELO - Scientific Electronic Library Online

 
vol.19 número2Control de admisión y asignación de canal para acceso dinámico de espectro usando cómputo multi-objetivoDesign of a General Purpose 8-bit RISC Processor for Computer Architecture Learning índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

COSTA-JUSSA, Marta R.. Segmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval. Comp. y Sist. [online]. 2015, vol.19, n.2, pp.357-370. ISSN 2007-9737.  https://doi.org/10.13053/CyS-19-2-1550.

The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate a lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which allow reducing the vocabulary and data sparseness. Then, to these segmentations we add the morphological information of a POS language model. We combine all these approaches using a Minimum Bayes Risk strategy. Experiments show significant improvements from the enhanced system over the baseline system on the Brazilian-Portuguese/English language pair. Finally, we report a case study of the impact of enhancing the statistical machine translation system with morphology in a cross-language application system such as ONAIR which allows users to look for information in video fragments through queries in natural language.

Palavras-chave : Morphology; factored-based machine translation; cross-language information retrieval.

        · texto em Inglês     · Inglês ( pdf )

 

Creative Commons License Todo o conteúdo deste periódico, exceto onde está identificado, está licenciado sob uma Licença Creative Commons