A Graph-based Approach to Cross-language Multi-document Summarization

Boudin, Florian; Huet, Stéphane; Torres-Moreno, Juan-Manuel

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Polibits

On-line version ISSN 1870-9044

Polibits n.43 México Jan./Jun. 2011

A Graph–based Approach to Cross–language Multi–document Summarization

Florian Boudin¹, Stéphane Huet¹, and Juan–Manuel Torres–Moreno²

¹ Universite d'Avignon, France.

²Universite d'Avignon, France; Ecole Polytechnique de Montreal, Canada; Universidad Nacional Autónoma de Mexico, Mexico (e–mail: firstname.lastname@univ–avignon.fr).

Manuscript received November 9, 2010.
Manuscript accepted for publication January 15, 2011.

Abstract

Cross–language summarization is the task of generating a summary in a language different from the language of the source documents. In this paper, we propose a graph–based approach to multi–document summarization that integrates machine translation quality scores in the sentence extraction process. We evaluate our method on a manually translated subset of the DUC 2004 evaluation campaign. Results indicate that our approach improves the readability of the generated summaries without degrading their informativity.

Key words: Graph–based approach, cross–language multi–document summarization.

DESCARGAR ARTÍCULO EN FORMATO PDF

REFERENCES

[1] C. Banea, A. Moschitti, S. Somasundaran, and F. M. Zanzotto, Eds., Proceedings of TextGraphs–5 Workshop, Uppsala University. Uppsala, Sweden: ACL, 2010. [Online]. Available: http://www.aclweb.org/anthology/W10–23. [ Links ]

[2] J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte, A. Kulesza, A. Sanchis, and N. Ueffing, "Confidence estimation for machine translation," Johns Hopkins University, Batimore, MD, USA, Tech. Rep., 2003. [ Links ]

[3] S. Raybaud, D. Langlois, and K. Smaili, "Efficient combination of confidence measures for machine translation," in Proceedings of Interspeech 2009 conference, Brighton, UK, 2009, pp. 424–427. [ Links ]

[4] L. Specia, N. Cancedda, M. Dymetman, M. Turchi, and N. Cristianini, "Estimating the sentence–level quality of machine translation systems," in Proceedings of EAMT 2009 conference, Barcelona, Spain, 2009, pp. 28–35. [ Links ]

[5] C. B. Quirk, "Training a sentence–level machine translation confidence measure," in Proceedings of LREC 2004 conference, Lisbon, Portugal, 2004, pp. 825–828. [ Links ]

[6] D. Radev, H. Jing, M. Sty, and D. Tam, "Centroid–based summarization of multiple documents," Information Processing & Management, vol. 40, no. 6, pp. 919–938, 2004. [ Links ]

[7] K.–F. Wong, M. Wu, and W. Li, "Extractive summarization using supervised and semi–supervised learning," in Proceedings of Coling 2008 conference, Manchester, UK, 2008, pp. 985–992. [Online]. Available: http://www.aclweb.org/anthology/C08–1124. [ Links ]

[8] R. Barzilay, K. R. McKeown, and M. Elhadad, "Information fusion in the context of multi–document summarization," in Proceedings of ACL 1999 conference, College Park, MD, USA, 1999, pp. 550–557. [Online]. Available: http://www.aclweb.org/anthology/P99–1071. [ Links ]

[9] G. Erkan and D. Radev, "LexRank: Graph–based lexical centrality as salience in text summarization," JAIR, vol. 22, no. 1, pp. 457–479, 2004. [ Links ]

[10] R. Mihalcea, "Graph–based ranking algorithms for sentence extraction, applied to text summarization," in Proceedings of ACL 2004 conference, Barcelona, Spain, July 2004, pp. 170–173. [ Links ]

[11] R. Mihalcea and P. Tarau, "A language independent algorithm for single and multiple document summarization," in Proceedings of IJCNLP 2005 conference, vol. 5, Jeju Island, South Korea, 2005. [ Links ]

[12] C. Orasan and O. A. Chiorean, "Evaluation of a cross–lingual romanian–english multi–document summariser," in Proceedings of LREC 2008 conference, Marrakech, Morocco, 2008. [Online]. Available: http://clg.wlv.ac.uk/papers/539_paper.pdf. [ Links ]

[13] J. Carbonell and J. Goldstein, "The use of MMR, diversity–based reranking for reordering documents and producing summaries," in Proceedings of SIGIR 1998 conference. ACM, 1998, pp. 335–336. [ Links ]

[14] X. Wan, H. Li, and J. Xiao, "Cross–language document summarization based on machine translation quality prediction," in Proceedings of ACL 2010 conference, Uppsala, Sweden, 2010, pp. 917–926. [Online]. Available: http://www.aclweb.org/anthology/P10–1094. [ Links ]

[15] T. Kiss and J. Strunk, "Unsupervised multilingual sentence boundary detection," Computational Linguistics, vol. 32, no. 4, pp. 485–525, 2006. [ Links ]

[16] S. Bird and E. Loper, "Nltk: The natural language toolkit," in Proceedings of ACL 2004 conference, Barcelona, Spain, 2004, pp. 214–217. [ Links ]

[17] C. Callison–Burch, P. Koehn, C. Monz, K. Peterson, M. Przybocki, and O. Zaidan, "Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation," in Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (WMT), Uppsala, Sweden, 2010, pp. 17–53. [Online]. Available: http://www.aclweb.org/anthology/W10–1703. [ Links ]

[18] C.–C. Chang and C.–J. Lin, LIBSVM: a library for support vector machines, 2001, software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [ Links ]

[19] G. Doddington, "Automatic evaluation of machine translation quality using n–gram co–occurrence statistics," in Proceedings of HLT 2002 conference, San Diego, CA, USA, 2002, pp. 138–145. [ Links ]

[20] L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Stanford Digital Library Technologies Project, Tech. Rep., 1998. [ Links ]

[21] P. Genest, G. Lapalme, L. Nerima, and E. Wehrli, "A symbolic summarizer with 2 steps of sentence selection for tac 2009," in Proceedings of TAC 2009 Workshop, Gaithersburg, MD, USA, 2009. [ Links ]

[22] C.–Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text Summarization Branches Out: Proceedings ofACL–04 Workshop, S. S. Marie–Francine Moens, Ed., Barcelona, Spain, 2004, pp. 74–81. [ Links ]