SciELO - Scientific Electronic Library Online

 
vol.27 número1Estimulación visual basada en conceptos y su análisis mediante electroencefalografíaComparative Analysis of Clustering Methods for Fuzzy Classifiers Simplification índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

ABRAMOV, Aleksei V.; IVANOV, Vladimir V.  e  SOLOVYEV, Valery D.. Lexical Complexity Evaluation based on Context for Russian Language. Comp. y Sist. [online]. 2023, vol.27, n.1, pp.127-139.  Epub 16-Jun-2023. ISSN 2007-9737.  https://doi.org/10.13053/cys-27-1-4528.

The task of identifying complex words within a context usually referred to as Complex Word Identification (CWI) or Lexical Complexity Prediction (LCP), is a vital component in Lexical Simplification pipelines. Correctness of complexity estimation depends on presented features, i.e. hand-crafted features, word embeddings, and presence of surrounding context, as well as on exploited rules or models, i.e. manually designed filtering, classic machine learning models, recurrent neural networks, and Transformer-based models. To our knowledge, the majority of existing works in CWI and LCP areas are devoted to investigating properties of English words and texts, accompanied by studies of German, Spanish, French and Hindu languages with little to no attention to Russian. In this paper, we present a study on lexical complexity estimation for the Russian language, by investigating the following topics: how well do morphological, semantic, and syntactic properties of a word represent its complexity; does a surrounding context significantly affect the accuracy of complexity estimation. We provide a brief description of the dataset of lexical complexity in context based on the Russian Synodal Bible and expand it by presenting a dataset of morphological, semantic, and syntactic features for annotated words. Additionally, we present linear regression and RuBERT models as baselines for lexical complexity estimation respectively.

Palavras-chave : Lexical complexity; Russian language; Bible; corpus; Wiktionary.

        · texto em Inglês     · Inglês ( pdf )