Single-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction

Berend, Gábor; Farkas, Richárd

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 Ciudad de México Apr./Jun. 2013

Artículos

Single-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction

Extracción de palabras clave de documentos individuales para extracción de palabras clave de documentos múltiples

Gábor Berend¹ and Richárd Farkas²

¹ University of Szeged, Department of Informatics, Árpád tér 2., 6720 Szeged, Hungary berendg@inf.u-szeged.hu

² University of Szeged, Department of Informatics, Árpád tér 2., 6720 Szeged, Hungary rfarkas@inf.u-szeged.hu

Article received on 07/12/2012
Accepted on 13/01/2013.

Abstract

Here, we address the task of assigning relevant terms to thematically and semantically related sub-corpora and achieve superior results compared to the baseline performance. Our results suggest that more reliable sets of keyphrases can be assigned to the semantically and thematically related subsets of some corpora if the automatically determined sets of keyphrases for the individual documents of an entire corpus are identified first. The sets of keyphrases assigned by our proposed method for the workshops present in the ACL Anthology Corpus over a 6-year period were considered better in more than 60% of the test cases compared to our baseline system when evaluated against an aggregation of different human judgements.

Keywords: Multi-document keyphrase extraction, knowledge management, information retrieval.

Resumen

En este artículo se considera el tema de asignación de términos relevantes a sub-corpus con temas y semántica relacionados y se logran resultados superiores a los del rendimiento de referencia. Los resultados obtenidos en este trabajo muestran que los conjuntos más confiables de palabras clave pueden ser asignados a subconjuntos con temas y semántica relacionados de un corpus si primero se identifican automáticamente los subconjuntos de palabras clave de documentos individuales en todo corpus. Los conjuntos de palabras clave asignados mediante el método propuesto para los talleres incluidos en ACL Anthology Corpus para el periodo de 6 años fueron considerados mejor en más de 60.

Palabras clave: Extracción de palabras clave de documentos múltiples, administración de conocimiento, recuperación de información.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgments

This work was in part supported by the European Union and the European Social Fund through the project FuturICT.hu (grant no.: TÁMOP-4.2.2.C-11/1/KONV-2012-0013).

References

1. Banchs, R. E., editor (2012). Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries. Association for Computational Linguistics, Jeju Island, Korea. [ Links ]

2. Berend, G. (2011). Opinion expression mining by exploiting keyphrase extraction. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, 1162-1170. [ Links ]

3. Ding, Z., Zhang, Q., & Huang, X. (2011). Keyphrase extraction from online news using binary integer programming. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, 165-173. [ Links ]

4. Farkas, R., Berend, G., Hegedűs, I., Kárpáti, A., & Krich, B. (2010). Automatic free-text-tagging of online news archives. In Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence. IOS Press, Amsterdam, The Netherlands, The Netherlands. ISBN 978-1-60750-605-8, 529-534. [ Links ]

5. Gupta, S. & Manning, C. (2011). Analyzing the dynamics of research by extracting key aspects of scientific papers. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, 1-9. [ Links ]

6. Hammouda, K. M., Matute, D. N., & Kamel, M. S. (2005). Corephrase: keyphrase extraction for document clustering. In Proceedings of MLDM. 265-274. [ Links ]

7. Kim, S. N., Medelyan, O., Kan, M.-Y., & Baldwin, T. (2010). Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval '10. ACL, Morristown, NJ, USA, 21-26. [ Links ]

8. Manning, C. D., Raghavan, P., & Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. ISBN 0521865719, 9780521865715. [ Links ]

9. McCallum, A. K. (2002). Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu. [ Links ]

10. Nguyen, T. D. & Kan, M.-Y. (2007). Keyphrase extraction in scientific publications. In Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers, ICADL'07. Springer-Verlag, Berlin, Heidelberg. ISBN 3-540-77093-3, 978-3-540-77093-0, 317-326. [ Links ]

11. Schäfer, U., Read, J., & Oepen, S. (2012). Towards an acl anthology corpus with logical document structure. An overview of the acl 2012 contributed task. In Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries. Association for Computational Linguistics, Jeju Island, Korea, 88-97. [ Links ]

12. Surendran, A. C. (2010). Multi-document keyphrase extraction using partial mutual information. Patent. US 7711737. [ Links ]

13. Toutanova, K. & Manning, C. D. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora, EMNLP '00. ACL, Stroudsburg, PA, USA, 63-70. doi: http://dx.doi.org/10.3115/1117794.1117802. [ Links ]

14. Turney, P. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2, 303-336. [ Links ]

15. Wan, X. & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, AAAI'08. AAAI Press. ISBN 978-1-57735-368-3, 855-860. [ Links ]

16. Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Craig (1999). Kea: Practical automatic keyphrase extraction. In ACM DL. 254-255. [ Links ]