SciELO - Scientific Electronic Library Online

 
vol.18 issue3Vector Space Basis Change in Information RetrievalEntity Extraction in Biochemical Text using Multiobjective Optimization author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.3 Ciudad de México Jul./Sep. 2014

https://doi.org/10.13053/CyS-18-3-2026 

Artículos regulares

 

Multi-document Summarization using Tensor Decomposition

 

Marina Litvak and Natalia Vanetik

 

Shamoon College of Engineering, Beer Sheva, Israel. marinal@sce.ac.il, natalyav@sce.ac.il.

 

Article received on 31/12/2013.
Accepted on 12/02/2014.

 

Abstract

The problem of extractive text summarization for a collection of documents is defined as selecting a small subset of sentences so the contents and meaning of the original document set are preserved in the best possible way. In this paper we present a new model for the problem of extractive summarization, where we strive to obtain a summary that preserves the information coverage as much as possible, when compared to the original document set. We construct a new tensor-based representation that describes the given document set in terms of its topics. We then rank topics via Tensor Decomposition, and compile a summary from the sentences of the highest ranked topics.

Keywords: Tensor decomposition, multilingual multi-focument summarization.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Acknowledgments

Authors are grateful to Igor Vinokur for the plugin implementation and technical support.

 

References

1. Badeau, R. & Boyer, R. (2008). Fast multilinear singular value decomposition for structured tensors. SIAM. J. Matrix Anal. and Appl., 30(3), 1008-1021.         [ Links ]

2. Bader, B. W., Kolda, T. G., et al. (2012). Matlab tensor toolbox version 2.5.         [ Links ]

3. Barzilay, R., Elhadad, N., & McKeown, K. R. (2001 ). Sentence ordering in multidocument summarization. In Proceedings of the First International Conference on Human Language Technology Research. 1-7.         [ Links ]

4. Conroy, J. M., Schlesinger, J. D., Kubina, J., Rankel, P. A., & O'Leary, D. P. (2011). CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics. In Proceedings of TAC 2011.         [ Links ]

5. Evans, D. K., Mckeown, K., & Klavans, J. L. (2005). Similarity-based multilingual multi-document summarization. IEEE Transactions on Information Theory, 49.         [ Links ]

6. Filatova, E. & Hatzivassiloglou, V. (2004). Event-based extractive summarization. In In Proceedings of ACL Workshop on Summarization. 104-111.         [ Links ]

7. Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., & Varma, V. (2011). TAC 2011 MultiLing Pilot Overview. In TAC 2011: Proceedings of Text Analysis Conference.         [ Links ]

8. Gulliksen, H. & Frederiksen, N. (1964). The extension of factor analysis to three-dimensional matrices. Contributions to Mathematical Psychology.         [ Links ]

9. Hastad, J. (1990). Tensor rank is np-complete. Journal of Algorithms, 11, 644-654.         [ Links ]

10. Hitoshi Nishikawa, Y. M., Takaaki Hasegawa & Kikui, G. (2010). Opinion Summarization with Integer Linear Programming Formulation for Sentence Extraction and Ordering. In Coling 2010: Poster Volume. 910-918.         [ Links ]

11. Hmida, F. & Favre, B. (2011). LIF at TAC MultiLing: Towards a Truly Language Independent Summa-rizer. In Proceedings of TAC 2011 .         [ Links ]

12. Honarpisheh, M. A., Ghassem-Sani, G., & Mir-roshandel, G. (2008). A multi-document multilingual automatic summarization system. In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008). 733-738.         [ Links ]

13. Hu, M., Sun, A., & peng Lim, E. (2008). Comments-oriented document summarization: Understanding documents with readers feedback. In In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR 2008. ACM.         [ Links ]

14. Itai, A., Jacob, Y., & Chen, G. (2003). MILA: Knowledge Center for Processing Hebrew.

15. Itai, A. & Wintner, S. (2008). Language resources for Hebrew. Language Resources and Evaluation, 42(1), 75-98.         [ Links ]

16. Jones, K. S. & Willet, P. (1997). Readings in Information Retrieval. San Francisco: Morgan Kaufmann. ISBN 1-55860-454-4.         [ Links ]

17. Khoja, S. (2001). Arabic stemmer.         [ Links ]

18. Khuller, S., Moss, A., & Naor, J. S. (1999). The budgeted maximum coverage problem. Information Precessing Letters, 70(1), 39-45.         [ Links ]

19. Kolda, T. G. & Bader, B. W. (2007). Tensor decompositions and applications. Technical report, Sandia National Laboratories.         [ Links ]

20. Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004). 25-26.         [ Links ]

21. Makino, T., Takamura, H., & Okumura, M. (2011). Balanced coverage of aspects for text summarization. In TAC 2011: Proceedings of Text Analysis Conference.         [ Links ]

22. Mani, I. & Maybury, M. (1999). Advances in Automatic Text Summarization. MIT Press, Cambridge, MA.         [ Links ]

23. Manna, S., Petres, Z., & Gedeon, T. (2009). Tensor term indexing: An application of HOSVD for document summarization. In 4th International Symposium on Computational Intelligence and Intelligent Informatics, 2009. ISCIII '09. 135-141.         [ Links ]

24. Osinski, S., Stefanowski, J., & Weiss, D. (2004). Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In Intelligent Information Systems. 359-368.         [ Links ]

25. Porter, M. (2006). The porter stemming algorithm.         [ Links ]

26. Saggion, H. (2006). Multilingual multidocument summarization tools and evaluation. In Proceedings of the International Conference on Language Resources and Evaluation. 1312-1317.         [ Links ]

27. Steinberger, J., Kabadjov, M., Steinberger, R., Tanev, H., Turchi, M., & Zavarella, V. (2011). JRC Participation at TAC 2011: Guided and Multilingual Summarization Tasks. In Proceedings of TAC 2011 .         [ Links ]

28. Takamura, H. & Okumura, M. (2009). Text summarization model based on maximum coverage problem and its variant. In EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 781-789.         [ Links ]

29. Trefethen, L. N. & Bau, D. (1997). Numerical linear algebra. Philadelphia: Society for Industrial and Applied Mathematics.         [ Links ]

30. Tucker, L. (1963). Implications of factor analysis of three-way matrices for measurement of change. Problems in Measuring Change, 122-137.         [ Links ]

31. Tucker, L. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279-311.         [ Links ]

32. Weiss, D. & Osinski, S. (2004). Carrot2 open source search results clustering engine. http://search.carrot2.org.         [ Links ]

33. Woodsend, K. & Lapata, M. (2010). Automatic Generation of Story Highlights. In ACL 2010: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 565-574.         [ Links ]

34. Zamir, O. & Etzioni, O. (1998). Web Document Clustering: A Feasibility Demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 46-54.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License