versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
A Graphbased Approach to Crosslanguage Multidocument Summarization
Florian Boudin1, Stéphane Huet1, and JuanManuel TorresMoreno2
1 Universite d'Avignon, France.
2 Universite d'Avignon, France; Ecole Polytechnique de Montreal, Canada; Universidad Nacional Autónoma de Mexico, Mexico (email: firstname.lastname@example.org).
Manuscript received November 9, 2010.
Manuscript accepted for publication January 15, 2011.
Crosslanguage summarization is the task of generating a summary in a language different from the language of the source documents. In this paper, we propose a graphbased approach to multidocument summarization that integrates machine translation quality scores in the sentence extraction process. We evaluate our method on a manually translated subset of the DUC 2004 evaluation campaign. Results indicate that our approach improves the readability of the generated summaries without degrading their informativity.
Key words: Graphbased approach, crosslanguage multidocument summarization.
 C. Banea, A. Moschitti, S. Somasundaran, and F. M. Zanzotto, Eds., Proceedings of TextGraphs5 Workshop, Uppsala University. Uppsala, Sweden: ACL, 2010. [Online]. Available: http://www.aclweb.org/anthology/W1023. [ Links ]
 J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte, A. Kulesza, A. Sanchis, and N. Ueffing, "Confidence estimation for machine translation," Johns Hopkins University, Batimore, MD, USA, Tech. Rep., 2003. [ Links ]
 S. Raybaud, D. Langlois, and K. Smaili, "Efficient combination of confidence measures for machine translation," in Proceedings of Interspeech 2009 conference, Brighton, UK, 2009, pp. 424427. [ Links ]
 L. Specia, N. Cancedda, M. Dymetman, M. Turchi, and N. Cristianini, "Estimating the sentencelevel quality of machine translation systems," in Proceedings of EAMT 2009 conference, Barcelona, Spain, 2009, pp. 2835. [ Links ]
 C. B. Quirk, "Training a sentencelevel machine translation confidence measure," in Proceedings of LREC 2004 conference, Lisbon, Portugal, 2004, pp. 825828. [ Links ]
 D. Radev, H. Jing, M. Sty, and D. Tam, "Centroidbased summarization of multiple documents," Information Processing & Management, vol. 40, no. 6, pp. 919938, 2004. [ Links ]
 K.F. Wong, M. Wu, and W. Li, "Extractive summarization using supervised and semisupervised learning," in Proceedings of Coling 2008 conference, Manchester, UK, 2008, pp. 985992. [Online]. Available: http://www.aclweb.org/anthology/C081124. [ Links ]
 R. Barzilay, K. R. McKeown, and M. Elhadad, "Information fusion in the context of multidocument summarization," in Proceedings of ACL 1999 conference, College Park, MD, USA, 1999, pp. 550557. [Online]. Available: http://www.aclweb.org/anthology/P991071. [ Links ]
 G. Erkan and D. Radev, "LexRank: Graphbased lexical centrality as salience in text summarization," JAIR, vol. 22, no. 1, pp. 457479, 2004. [ Links ]
 R. Mihalcea, "Graphbased ranking algorithms for sentence extraction, applied to text summarization," in Proceedings of ACL 2004 conference, Barcelona, Spain, July 2004, pp. 170173. [ Links ]
 R. Mihalcea and P. Tarau, "A language independent algorithm for single and multiple document summarization," in Proceedings of IJCNLP 2005 conference, vol. 5, Jeju Island, South Korea, 2005. [ Links ]
 C. Orasan and O. A. Chiorean, "Evaluation of a crosslingual romanianenglish multidocument summariser," in Proceedings of LREC 2008 conference, Marrakech, Morocco, 2008. [Online]. Available: http://clg.wlv.ac.uk/papers/539_paper.pdf. [ Links ]
 J. Carbonell and J. Goldstein, "The use of MMR, diversitybased reranking for reordering documents and producing summaries," in Proceedings of SIGIR 1998 conference. ACM, 1998, pp. 335336. [ Links ]
 X. Wan, H. Li, and J. Xiao, "Crosslanguage document summarization based on machine translation quality prediction," in Proceedings of ACL 2010 conference, Uppsala, Sweden, 2010, pp. 917926. [Online]. Available: http://www.aclweb.org/anthology/P101094. [ Links ]
 T. Kiss and J. Strunk, "Unsupervised multilingual sentence boundary detection," Computational Linguistics, vol. 32, no. 4, pp. 485525, 2006. [ Links ]
 S. Bird and E. Loper, "Nltk: The natural language toolkit," in Proceedings of ACL 2004 conference, Barcelona, Spain, 2004, pp. 214217. [ Links ]
 C. CallisonBurch, P. Koehn, C. Monz, K. Peterson, M. Przybocki, and O. Zaidan, "Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation," in Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (WMT), Uppsala, Sweden, 2010, pp. 1753. [Online]. Available: http://www.aclweb.org/anthology/W101703. [ Links ]
 G. Doddington, "Automatic evaluation of machine translation quality using ngram cooccurrence statistics," in Proceedings of HLT 2002 conference, San Diego, CA, USA, 2002, pp. 138145. [ Links ]
 L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Stanford Digital Library Technologies Project, Tech. Rep., 1998. [ Links ]
 P. Genest, G. Lapalme, L. Nerima, and E. Wehrli, "A symbolic summarizer with 2 steps of sentence selection for tac 2009," in Proceedings of TAC 2009 Workshop, Gaithersburg, MD, USA, 2009. [ Links ]
 C.Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text Summarization Branches Out: Proceedings ofACL04 Workshop, S. S. MarieFrancine Moens, Ed., Barcelona, Spain, 2004, pp. 7481. [ Links ]