SciELO - Scientific Electronic Library Online

 
vol.28 número4Systematic Literature Review on Cybersecurity and its Influence on Cyber Attacks Targeting IoT DevicesA Look at Side Channel Attacks on Post-quantum Cryptography índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

STARCHENKO, Vladimir. No Need to Get Wasteful: The Way to Train a Lightweight Competitive Spelling Checker Using (Concentrated) Synthetic Datasets. Comp. y Sist. [online]. 2024, vol.28, n.4, pp.1865-1877.  Epub 25-Mar-2025. ISSN 2007-9737.  https://doi.org/10.13053/cys-28-4-5068.

This study focuses on spelling checkers, which remains problematic for modern error correction systems. Based on T5 architecture, we create a lightweight spelling check tool that can be used in combination with a large language model (LLM) and significantly improves the overall result of the error correction system. It also performs competitively compared to other recently developed spelling check tools, despite being considerably smaller in size. The high performance of the model is obtained as a result of introducing two synthetic datasets: a dataset with a high density of spelling errors and the dataset with errors more difficult for correction.

Palavras-chave : Spelling errors; spelling check; grammatical error correction; preprocessing; synthetic datasets.

        · texto em Inglês     · Inglês ( pdf )