versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Marco Turchi and Maud Ehrmann
Joint Research Centre (JRC), IPSC GlobSec, European Commission, Via Fermi 2749, 21027, Ispra (VA), Italy (email: email@example.com.)
Manuscript received November 2, 2010.
Manuscript accepted for publication January 14, 2011.
Translation capability of a PhraseBased Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efflciently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on EnFr and FrEn translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of outofvocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.
Key words: Machine translation, knowledge, morphological resources.
 C. CallisonBurch, P. Koehn, C. Monz, and J. Schroeder, "Findings of the 2009 Workshop on Statistical Machine Translation," in Proceedings of WSMT, 2009, pp. 128. [ Links ]
 C. CallisonBurch and M. Osborne, "Reevaluating the role of BLEU in machine translation research," in Proceedings of EACL, 2006, pp. 249256. [ Links ]
 A. De Gispert, J.B. Mariño, and J.M. Crego, "Improving statistical machine translation by classifying and generalizing inflected verb forms," in Proceedings of 9th European Conference on Speech Communication and Technology, 2005, pp. 31933196. [ Links ]
 T. Erjavec, "MULTEXTEast Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora," in Proc. of the Fourth Intl. Conf. on Language Resources and Evaluation, 2004. [ Links ]
 M. Federico, N. Bertoldi, and M. Cettolo, "IRSTLM: an open source toolkit for handling large scale language models," in Proceedings of Interspeech, 2008, pp. 16181621. [ Links ]
 M. Garcia, J. Gimenez, and L. Marquez, "Enriching Statistical Translation Models Using a DomainIndependent Multilingual Lexical Knowledge Base," Lecture notes in computer science (Computational Linguistics and Intelligent Text Processing), vol. 5449, pp. 306317, 2009. [ Links ]
 S. Goldwater and D. McClosky, "Improving statistical MT through morphological analysis," in Proceedings of EMNLP, 2006, pp. 676683. [ Links ]
 N. Habash, "Four techniques for online handling of outofvocabulary words in ArabicEnglish statistical machine translation," in Proceedings of ACL, 2006, pp. 5760. [ Links ]
 G. Haffari, M. Roy, and A. Sarkar, "Active learning for statistical phrasebased machine translation," in Proceedings of NAACL, 2009, pp. 415423. [ Links ]
 H. Johnson, J. Martin, G. Foster, and R. Kuhn, "Improving translation quality by discarding most of the phrasetable," Proceedings of EMNLPCoNLL, 2007, pp. 967975. [ Links ]
 P. Koehn, "Statistical significance tests for machine translation evaluation," in Proceedings of EMNLP, 2005, pp. 388395. [ Links ]
 P. Koehn, "Europarl: A parallel corpus for statistical machine translation," in Proceedings of MT summit, 2005. [ Links ]
 P. Koehn and H. Hoang, "Factored translation models," in Proceedings of EMNLPCoNLL, 2007, pp. 868876. [ Links ]
 P. Koehn, H. Hoang, A. Birch, C. CallisonBurch, M. Federico and others, "Moses: Open source toolkit for statistical machine translation," in Proceedings of ACL, demonstration session, 2007, pp. 16181621. [ Links ]
 A. Lavie and A. Agarwal, "METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments," in Proceedings of the Second Workshop on Statistical Machine Translation, 2007, pp. 228231. [ Links ]
 Y. Marton, C. CallisonBurch, and P. Resnik, "Improved statistical machine translation using monolinguallyderived paraphrases," in Proceedings of EMNLP, 2009, pp. 381390. [ Links ]
 S. Mirkin, L. Specia, N. Cancedda, I. Dagan, M. Dymetman, and I. Szpektor, "ourcelanguage entailment modeling for translating unknown terms," in Proceedings of ACL, 2009, pp. 791799. [ Links ]
 K. Papineni, S. Roukos, T. Ward and W. J. Zhu, "BLEU: a method for automatic evaluation of machine translation," in Proceedings of ACL, 2002, pp. 311318. [ Links ]
 J. Tinsley, M. Hearne and A. Way, "Exploiting parallel treebanks to improve phrasebased statistical machine translation," in Proceedings of CICLing, 2009, pp. 318331. [ Links ]
 M. Turchi, T. DeBie, and N. Cristianini, "Learning performance of a machine translation system: a statistical and computational analysis," Proceedings of the Third Workshop on Statistical Machine Translation, 2008, pp. 3543. [ Links ]
 M. Yang and K. Kirchhoff, "Phrasebased backoff models for machine translation of highly inflected languages," in Proceedings of EACL, 2006, pp. 4148. [ Links ]