SciELO - Scientific Electronic Library Online

 
vol.26 número2A Combination of Sentiment Analysis Systems for the Study of Online Travel Reviews: Many Heads are Better than OneAutomatic Hate Speech Detection Using Deep Neural Networks and Word Embedding índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

WARJRI, Sunita; PAKRAY, Partha; LYNGDOH, Saralin A.  e  MAJI, Arnab Kumar. Identification of POS Tags for the Khasi Language based on Brill’s Transformation Rule-Based Tagger. Comp. y Sist. [online]. 2022, vol.26, n.2, pp.989-1005.  Epub 10-Mar-2023. ISSN 2007-9737.  https://doi.org/10.13053/cys-26-2-4058.

Khasi is a Mon-Khmer language that belongs to the Austro-Asiatic language family. Khasi language is spoken by the indigenous people of the state Meghalaya in the North-Eastern part of India. The main purpose of this paper is to develop Part-of-Speech (PoS) tagger for the Khasi language using a Rule-based approach. To work on POS tagging, one needs a grammatically tagged corpus. However, the Khasi language does not have a standard corpus for PoS tagging. Therefore, another aim or purpose of this paper is to develop a Khasi lexicon or POS corpus and using the Rule-Based Brill’s Transformation to automatically tag the given Khasi text. While anticipating the challenges in building such a corpus, this paper has brought out an analysis based on the Khasi corpus of around 1,03,998 words in its initial phase. We also show in this paper how the Khasi corpus is created. By using Brill’s Transformation rule-based learning on the created corpus in this preliminary study, accuracies of 97.73% and 95.52% were obtained on validating data and testing data respectively. This work is the first attempt to investigate POS tagging using the rule-based model with the designed Khasi POS corpus.

Palavras-chave : Natural language processing (NLP); computational linguistic; part-of-speech (PoS); PoS tagging; Khasi language; Khasi corpus; lexical morphology; transformation rule-based tagging.

        · texto em Inglês     · Inglês ( pdf )