Multi-document Summarization using Tensor Decomposition

Litvak, Marina; Vanetik, Natalia

doi:10.13053/CyS-18-3-2026

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.3 Ciudad de México Jul./Sep. 2014

https://doi.org/10.13053/CyS-18-3-2026

Artículos regulares

Multi-document Summarization using Tensor Decomposition

Marina Litvak and Natalia Vanetik

Shamoon College of Engineering, Beer Sheva, Israel. marinal@sce.ac.il, natalyav@sce.ac.il.

Article received on 31/12/2013.
Accepted on 12/02/2014.

Abstract

The problem of extractive text summarization for a collection of documents is defined as selecting a small subset of sentences so the contents and meaning of the original document set are preserved in the best possible way. In this paper we present a new model for the problem of extractive summarization, where we strive to obtain a summary that preserves the information coverage as much as possible, when compared to the original document set. We construct a new tensor-based representation that describes the given document set in terms of its topics. We then rank topics via Tensor Decomposition, and compile a summary from the sentences of the highest ranked topics.

Keywords: Tensor decomposition, multilingual multi-focument summarization.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgments

Authors are grateful to Igor Vinokur for the plugin implementation and technical support.

References

1. Badeau, R. & Boyer, R. (2008). Fast multilinear singular value decomposition for structured tensors. SIAM. J. Matrix Anal. and Appl., 30(3), 1008-1021. [ Links ]

2. Bader, B. W., Kolda, T. G., et al. (2012). Matlab tensor toolbox version 2.5. [ Links ]

3. Barzilay, R., Elhadad, N., & McKeown, K. R. (2001 ). Sentence ordering in multidocument summarization. In Proceedings of the First International Conference on Human Language Technology Research. 1-7. [ Links ]

4. Conroy, J. M., Schlesinger, J. D., Kubina, J., Rankel, P. A., & O'Leary, D. P. (2011). CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics. In Proceedings of TAC 2011. [ Links ]

5. Evans, D. K., Mckeown, K., & Klavans, J. L. (2005). Similarity-based multilingual multi-document summarization. IEEE Transactions on Information Theory, 49. [ Links ]

6. Filatova, E. & Hatzivassiloglou, V. (2004). Event-based extractive summarization. In In Proceedings of ACL Workshop on Summarization. 104-111. [ Links ]

7. Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., & Varma, V. (2011). TAC 2011 MultiLing Pilot Overview. In TAC 2011: Proceedings of Text Analysis Conference. [ Links ]

8. Gulliksen, H. & Frederiksen, N. (1964). The extension of factor analysis to three-dimensional matrices. Contributions to Mathematical Psychology. [ Links ]

9. Hastad, J. (1990). Tensor rank is np-complete. Journal of Algorithms, 11, 644-654. [ Links ]

10. Hitoshi Nishikawa, Y. M., Takaaki Hasegawa & Kikui, G. (2010). Opinion Summarization with Integer Linear Programming Formulation for Sentence Extraction and Ordering. In Coling 2010: Poster Volume. 910-918. [ Links ]

11. Hmida, F. & Favre, B. (2011). LIF at TAC MultiLing: Towards a Truly Language Independent Summa-rizer. In Proceedings of TAC 2011 . [ Links ]

12. Honarpisheh, M. A., Ghassem-Sani, G., & Mir-roshandel, G. (2008). A multi-document multilingual automatic summarization system. In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008). 733-738. [ Links ]

13. Hu, M., Sun, A., & peng Lim, E. (2008). Comments-oriented document summarization: Understanding documents with readers feedback. In In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR 2008. ACM. [ Links ]

14. Itai, A., Jacob, Y., & Chen, G. (2003). MILA: Knowledge Center for Processing Hebrew.

15. Itai, A. & Wintner, S. (2008). Language resources for Hebrew. Language Resources and Evaluation, 42(1), 75-98. [ Links ]

16. Jones, K. S. & Willet, P. (1997). Readings in Information Retrieval. San Francisco: Morgan Kaufmann. ISBN 1-55860-454-4. [ Links ]

17. Khoja, S. (2001). Arabic stemmer. [ Links ]

18. Khuller, S., Moss, A., & Naor, J. S. (1999). The budgeted maximum coverage problem. Information Precessing Letters, 70(1), 39-45. [ Links ]

19. Kolda, T. G. & Bader, B. W. (2007). Tensor decompositions and applications. Technical report, Sandia National Laboratories. [ Links ]

20. Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004). 25-26. [ Links ]

21. Makino, T., Takamura, H., & Okumura, M. (2011). Balanced coverage of aspects for text summarization. In TAC 2011: Proceedings of Text Analysis Conference. [ Links ]

22. Mani, I. & Maybury, M. (1999). Advances in Automatic Text Summarization. MIT Press, Cambridge, MA. [ Links ]

23. Manna, S., Petres, Z., & Gedeon, T. (2009). Tensor term indexing: An application of HOSVD for document summarization. In 4th International Symposium on Computational Intelligence and Intelligent Informatics, 2009. ISCIII '09. 135-141. [ Links ]

24. Osinski, S., Stefanowski, J., & Weiss, D. (2004). Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In Intelligent Information Systems. 359-368. [ Links ]

25. Porter, M. (2006). The porter stemming algorithm. [ Links ]

26. Saggion, H. (2006). Multilingual multidocument summarization tools and evaluation. In Proceedings of the International Conference on Language Resources and Evaluation. 1312-1317. [ Links ]

27. Steinberger, J., Kabadjov, M., Steinberger, R., Tanev, H., Turchi, M., & Zavarella, V. (2011). JRC Participation at TAC 2011: Guided and Multilingual Summarization Tasks. In Proceedings of TAC 2011 . [ Links ]

28. Takamura, H. & Okumura, M. (2009). Text summarization model based on maximum coverage problem and its variant. In EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 781-789. [ Links ]

29. Trefethen, L. N. & Bau, D. (1997). Numerical linear algebra. Philadelphia: Society for Industrial and Applied Mathematics. [ Links ]

30. Tucker, L. (1963). Implications of factor analysis of three-way matrices for measurement of change. Problems in Measuring Change, 122-137. [ Links ]

31. Tucker, L. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279-311. [ Links ]

32. Weiss, D. & Osinski, S. (2004). Carrot² open source search results clustering engine. http://search.carrot2.org. [ Links ]

33. Woodsend, K. & Lapata, M. (2010). Automatic Generation of Story Highlights. In ACL 2010: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 565-574. [ Links ]

34. Zamir, O. & Etzioni, O. (1998). Web Document Clustering: A Feasibility Demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 46-54. [ Links ]