Recommending Machine Translation Output to Translators by Estimating Translation Effort: A Case Study

Mathur, Prashant; Ruiz, Nick; Federico, Marcello

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Polibits

On-line version ISSN 1870-9044

Polibits n.47 México Jan./Jul. 2013

Recommending Machine Translation Output to Translators by Estimating Translation Effort: A Case Study

Prashant Mathur¹, Nick Ruiz¹, and Marcello Federico²

¹ University of Trento and FBK, Italy.

² FBK, Italy.

Manuscript received on December 7, 2012.
Accepted for publication on January 11, 2013.

Abstract

In this paper we use the statistics provided by a field experiment to explore the utility of supplying machine translation suggestions in a computer-assisted translation (CAT) environment. Regression models are trained for each user in order to estimate the time to edit (TTE) for the current translation segment. We use a combination of features from the current segment and aggregated features from formerly translated segments selected with content-based filtering approaches commonly used in recommendation systems. We present and evaluate decision function heuristics to determine if machine translation output will be useful for the translator in the given segment. We find that our regression models do a reasonable job for some users in predicting TTE given only a small number of training examples; although noise in the actual TTE for seemingly similar segments yields large error margins. We propose to include the estimation of TTE in CAT recommendation systems as a well-correlated metric for translation quality.

Key words: Machine translation, computer-assisted translation, quality estimation, recommender systems.

DESCARGAR ARTÍCULO EN FORMATO PDF

ACKNOWLEDGMENTS

This work is partially funded by the European Commission under the FP7 project MateCat, Grant 287688. The authors wish to thank Georgia Koutrika for her valuable suggestions in this experiment.

REFERENCES

[1] M. Federico, A. Cattelan, and M. Trombetti, "Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation ," in AMTA 2012, San Diego, California, October 2012. [ Links ]

[2] L. Specia, M. Turchi, Z. Wang, J. Shawe-Taylor, and C. Saunders, "Improving the confidence of machine translation quality estimates," in Machine Translation Summit XII, Ottawa, Canada, 2009. [ Links ]

[3] C. Buck, "Black box features for the WMT 2012 quality estimation shared task," in Proceedings of the Seventh Workshop on Statistical Machine Translation. Montreal, Canada: Association for Computational Linguistics, June 2012. [ Links ]

[4] C.-Y. Lin and F. J. Och, "Orange: a method for evaluating automatic evaluation metrics for machine translation," in Proceedings of Coling 2004. Geneva, Switzerland: COLING, Aug 23-Aug 27 2004, pp. 501-507. [ Links ]

[5] R. Soricut, N. Bach, and Z. Wang, "The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task," in Proceedings of the Seventh Workshop on Statistical Machine Translation. Montréal, Canada: Association for Computational Linguistics, June 2012, pp. 145-151. [Online], Available: http://www.aclweb.org/anthology/W12-3118 [ Links ]

[6] R. Soricut and A. Echihabi, "TrustRank: Inducing Trust in Automatic Translations via Ranking," in ACL, 2010, pp. 612-621. [ Links ]

[7] N. Bach, F. Huang, and Y. Al-Onaizan, "Goodness: a method for measuring machine translation confidence," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, ser. HLT '11. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011, pp. 211-219. [Online], Available: http://dl.acm.org/citation.cfm?id=2002472.2002500 [ Links ]

[8] Y. He, Y. Ma, J. van Genabith, and A. Way, "Bridging SMT and TM with Translation Recommendation," in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics, July 2010, pp. 622-630. [Online], Available: http://www.aclweb.org/anthology/P10-1064 [ Links ]

[9] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul, "A study of translation edit rate with targeted human annotation," in In Proceedings of Association for Machine Translation in the Americas, 2006, pp. 223-231. [ Links ]

[10] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst, "Moses: Open source toolkit for statistical machine translation," in ACL, 2007. [ Links ]

[11] PR Brown, S. A. Delia Pietra, V. J. Delia Pietra, and R. L. Mercer, "The mathematics of statistical machine translation: Parameter estimation,"Computational Linguistics, vol. 19, no. 2, pp. 263-312, 1993. [Online] ,Available: http://aclweb.Org/anthology-new/J/J93/J93-2003.pdf [ Links ]

[12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.Witten, "The weka data mining software: an update," SIGKDD Explor.Newsl., vol. 11, no. 1, pp. 10-18, Nov. 2009. [Online], Available: http://doi.acm.org/10.1145/1656274.1656278 [ Links ]

[13] J. R. Quinlan, "Learning with continuous classes," in Proceedings of the 5th Australian Joint Conference on Artificial Intelligence. World Scientific, 1992, pp. 343-348. [ Links ]