SciELO - Scientific Electronic Library Online

 
vol.24 número2An Univariable Approach for Forecasting Workload in the Maintenance IndustryPerformance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

HUETLE FIGUEROA, Juan; PEREZ TELLEZ, Fernando  e  PINTO, David. On Detecting Keywords for Concept Mapping in Plain Text. Comp. y Sist. [online]. 2020, vol.24, n.2, pp.651-668.  Epub 04-Out-2021. ISSN 2007-9737.  https://doi.org/10.13053/cys-24-2-3400.

The key terminology is very important for scientific works, especially for Natural Language Processing field. However, there is no optimal way to extract all the key terminology in a reliable manner. Thereby it is important to develop automatic methods for extracting key terms. This document presents a way to obtain the key terminology based on labels that were manually obtained by an expert in the area. Subsequently, we got POS (Part-of-the-speech) tags for each label, in which we obtained patterns from key terminology that were used as filters afterwards. Experiment 1 was tested using the labels obtained manually and the labels obtained by the proposed approach, with 60% of the corpus for training and 40% for tests. The patterns were evaluated with three different measures of evaluation such as precision, recall, and F-measure. Experiment 2 used three measures for ranking N-grams (sequence of terms), Point mutual information, Likelihood-ratio, and Chi-square. To obtain the best N-grams, we have implemented in experiment 3 intersections between the previous measures and filtering N-grams by POS patterns. Also, they were compared with the manually labeled set, evaluation measures were used to see its result, gave us a good recall moreover acceptable precision and F-measure. In experiment 4 POS patterns were tested in a much larger corpus of a different domain obtaining slightly higher results.

Palavras-chave : Collocations; n-gramas; POS; keyword extraction.

        · texto em Inglês     · Inglês ( pdf )