SciELO - Scientific Electronic Library Online

 issue43A Micro Artificial Immune System author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO



On-line version ISSN 1870-9044

Polibits  n.43 México Jan./Jun. 2011


A Graph–based Approach to Cross–language Multi–document Summarization


Florian Boudin1, Stéphane Huet1, and Juan–Manuel Torres–Moreno2


1 Universite d'Avignon, France.

2 Universite d'Avignon, France; Ecole Polytechnique de Montreal, Canada; Universidad Nacional Autónoma de Mexico, Mexico (e–mail: firstname.lastname@univ–


Manuscript received November 9, 2010.
Manuscript accepted for publication January 15, 2011.



Cross–language summarization is the task of generating a summary in a language different from the language of the source documents. In this paper, we propose a graph–based approach to multi–document summarization that integrates machine translation quality scores in the sentence extraction process. We evaluate our method on a manually translated subset of the DUC 2004 evaluation campaign. Results indicate that our approach improves the readability of the generated summaries without degrading their informativity.

Key words: Graph–based approach, cross–language multi–document summarization.





[1] C. Banea, A. Moschitti, S. Somasundaran, and F. M. Zanzotto, Eds., Proceedings of TextGraphs–5 Workshop, Uppsala University. Uppsala, Sweden: ACL, 2010. [Online]. Available:–23.         [ Links ]

[2] J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte, A. Kulesza, A. Sanchis, and N. Ueffing, "Confidence estimation for machine translation," Johns Hopkins University, Batimore, MD, USA, Tech. Rep., 2003.         [ Links ]

[3] S. Raybaud, D. Langlois, and K. Smaili, "Efficient combination of confidence measures for machine translation," in Proceedings of Interspeech 2009 conference, Brighton, UK, 2009, pp. 424–427.         [ Links ]

[4] L. Specia, N. Cancedda, M. Dymetman, M. Turchi, and N. Cristianini, "Estimating the sentence–level quality of machine translation systems," in Proceedings of EAMT 2009 conference, Barcelona, Spain, 2009, pp. 28–35.         [ Links ]

[5] C. B. Quirk, "Training a sentence–level machine translation confidence measure," in Proceedings of LREC 2004 conference, Lisbon, Portugal, 2004, pp. 825–828.         [ Links ]

[6] D. Radev, H. Jing, M. Sty, and D. Tam, "Centroid–based summarization of multiple documents," Information Processing & Management, vol. 40, no. 6, pp. 919–938, 2004.         [ Links ]

[7] K.–F. Wong, M. Wu, and W. Li, "Extractive summarization using supervised and semi–supervised learning," in Proceedings of Coling 2008 conference, Manchester, UK, 2008, pp. 985–992. [Online]. Available:–1124.         [ Links ]

[8] R. Barzilay, K. R. McKeown, and M. Elhadad, "Information fusion in the context of multi–document summarization," in Proceedings of ACL 1999 conference, College Park, MD, USA, 1999, pp. 550–557. [Online]. Available:–1071.         [ Links ]

[9] G. Erkan and D. Radev, "LexRank: Graph–based lexical centrality as salience in text summarization," JAIR, vol. 22, no. 1, pp. 457–479, 2004.         [ Links ]

[10] R. Mihalcea, "Graph–based ranking algorithms for sentence extraction, applied to text summarization," in Proceedings of ACL 2004 conference, Barcelona, Spain, July 2004, pp. 170–173.         [ Links ]

[11] R. Mihalcea and P. Tarau, "A language independent algorithm for single and multiple document summarization," in Proceedings of IJCNLP 2005 conference, vol. 5, Jeju Island, South Korea, 2005.         [ Links ]

[12] C. Orasan and O. A. Chiorean, "Evaluation of a cross–lingual romanian–english multi–document summariser," in Proceedings of LREC 2008 conference, Marrakech, Morocco, 2008. [Online]. Available:         [ Links ]

[13] J. Carbonell and J. Goldstein, "The use of MMR, diversity–based reranking for reordering documents and producing summaries," in Proceedings of SIGIR 1998 conference. ACM, 1998, pp. 335–336.         [ Links ]

[14] X. Wan, H. Li, and J. Xiao, "Cross–language document summarization based on machine translation quality prediction," in Proceedings of ACL 2010 conference, Uppsala, Sweden, 2010, pp. 917–926. [Online]. Available:–1094.         [ Links ]

[15] T. Kiss and J. Strunk, "Unsupervised multilingual sentence boundary detection," Computational Linguistics, vol. 32, no. 4, pp. 485–525, 2006.         [ Links ]

[16] S. Bird and E. Loper, "Nltk: The natural language toolkit," in Proceedings of ACL 2004 conference, Barcelona, Spain, 2004, pp. 214–217.         [ Links ]

[17] C. Callison–Burch, P. Koehn, C. Monz, K. Peterson, M. Przybocki, and O. Zaidan, "Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation," in Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (WMT), Uppsala, Sweden, 2010, pp. 17–53. [Online]. Available:–1703.         [ Links ]

[18] C.–C. Chang and C.–J. Lin, LIBSVM: a library for support vector machines, 2001, software available at         [ Links ]

[19] G. Doddington, "Automatic evaluation of machine translation quality using n–gram co–occurrence statistics," in Proceedings of HLT 2002 conference, San Diego, CA, USA, 2002, pp. 138–145.         [ Links ]

[20] L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Stanford Digital Library Technologies Project, Tech. Rep., 1998.         [ Links ]

[21] P. Genest, G. Lapalme, L. Nerima, and E. Wehrli, "A symbolic summarizer with 2 steps of sentence selection for tac 2009," in Proceedings of TAC 2009 Workshop, Gaithersburg, MD, USA, 2009.         [ Links ]

[22] C.–Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text Summarization Branches Out: Proceedings ofACL–04 Workshop, S. S. Marie–Francine Moens, Ed., Barcelona, Spain, 2004, pp. 74–81.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License