SciELO - Scientific Electronic Library Online

 
vol.24 número3The Impact of Key Ideas on Automatic Deception Detection in TextLightweight Online Separation of the Sound Source of Interest through BLSTM-Based Binary Masking índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

MATIAS MENDOZA, Griselda Areli et al. Ground Truth Spanish Automatic Extractive Text Summarization Bounds. Comp. y Sist. [online]. 2020, vol.24, n.3, pp.1241-1256.  Epub 09-Jun-2021. ISSN 2007-9737.  https://doi.org/10.13053/cys-24-3-3484.

The textual information has accelerated growth in the most spoken languages by native Internet users, such as Chinese, Spanish, English, Arabic, Hindi, Portuguese, Bengali, Russian, among others. It is necessary to innovate the methods of Automatic Text Summarization (ATS) that can extract essential information without reading the entire text. The most competent methods are Extractive ATS (EATS) that extract essential parts of the document (sentences, phrases, or paragraphs) to compose a summary. During the last 60 years of research of EATS, the creation of standard corpus with human-generated summaries and evaluation methods which are highly correlated with human judgments help to increase the number of new state-of-the-art methods. However, these methods are mainly supported for the English language, leaving aside other equally important languages such as Spanish, which is the second most spoken language by natives and the third most used on the Internet. A standard corpus for Spanish EATS (SAETS) is created to evaluate the state-of-the-art methods and systems for the Spanish language. The main contribution consists of a proposal for configuration and evaluation of 5 state-of-the-art methods, five systems and four heuristics using three evaluation methods (ROUGE, ROUGE-C, and Jensen-Shannon divergence). It is the first time that Jensen-Shannon divergence is used to evaluate AETS. In this paper the ground truth bounds for the Spanish language are presented, which are the heuristics baseline:first, baseline:random, topline and concordance. In addition, the ranking of 30 evaluation tests of the state-of-the-art methods and systems is calculated that forms a benchmark for SAETS.

Palabras llave : Spanish automatic text summarization; ROUGE; ROUGE-C; Jensen Shannon divergence; corpus TER.

        · texto en Inglés     · Inglés ( pdf )