SciELO - Scientific Electronic Library Online

 
vol.28 número4Sentiment Analysis for Religious TweetsData Mining Approach for the Prediction of Hypertension and its Correlation with Socioeconomic Factors in Mexico: A Case Study índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

ROJAS-SIMON, Jonathan; LEDENEVA, Yulia  y  GARCIA-HERNANDEZ, René Arnulfo. A Dimensionality Reduction Approach for Text Vectorization in Detecting Human and Machine-generated Texts. Comp. y Sist. [online]. 2024, vol.28, n.4, pp.1919-1929.  Epub 25-Mar-2025. ISSN 2007-9737.  https://doi.org/10.13053/cys-28-4-5214.

Distinguishing between human and machine-generated texts has been a task of recent interest in Natural Language Processing (NLP), especially in the face of the malicious use of Large-Language Models (LLMs). As the result of this, several state-of-the-art methods and approaches have been proposed, providing promising results. However, some of them are unreliable in explaining how features influence the detection of human and machine-generated texts. In this sense, previous studies have explored the effectiveness of traditional machine learning algorithms using lexical features based on ASCII code characters. Nevertheless, not all these features are used, which may difficult this task. Therefore, in this paper, we propose a dimensionality reduction of these features to improve the performance of this text vectorization using traditional machine learning algorithms. The proposed dimensionality reduction has been tested in the AuTexTification task in English and Spanish documents. According to the results, the dimensionality reduction of features improves the performance of machine-learning algorithms, serving this vectorization as inputs to more advanced machine-learning algorithms.

Palabras llave : Large-language models (LLMs); machine learning algorithms; ASCII-based text vectorization; dimensionality reduction; AuTexTification.

        · texto en Inglés     · Inglés ( pdf )