SciELO - Scientific Electronic Library Online

 
vol.35 número89Servicios accesibles a todos los usuarios en las bibliotecas universitarias españolas: estado de la cuestión índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Investigación bibliotecológica

versão On-line ISSN 2448-8321versão impressa ISSN 0187-358X

Resumo

POLO BAUTISTA, Luis Roberto  e  MARTINEZ ACEVEDO, Karen Vanessa. Algorithm for thematic analysis of digital documents. Investig. bibl [online]. 2021, vol.35, n.89, eib0895841901.  Epub 22-Mar-2022. ISSN 2448-8321.  https://doi.org/10.22201/iibi.24488321xe.2021.89.58419.

The objective of the article is to present an algorithm for assigning subject areas to digital documents which serve as a support tool for thematic analysis within the organization of information, in order to be implemented in development of controlled vocabularies. The methodology used consisted in applying Optical Character Recognition (OCR) and Latent Dirichlet Allocation (LDA) as main tools for developing an algorithm based on Python programming language,which allows reading of files with a PDF extension in order to obtain the main themes of textual corpus. Results of the algorithm’s application demonstrate its usefulness in the area of indexing as a system for identifying and extracting relevant topics from a specific document in electronic format, and allow automation of processes by the information professional. This way, its use as a development of alternative points of access based on the content of texts is concluded.

Palavras-chave : Latent Dirichlet Allocation; Algorithms; Thematic Analysis; Digital Documents.

        · resumo em Espanhol     · texto em Espanhol     · Espanhol ( pdf )