Morpheme based Language Model for Tamil Part-of-Speech Tagging

Lakshmana Pandian, S.; Geetha, T. V.

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Polibits

On-line version ISSN 1870-9044

Polibits n.38 México Jul./Dec. 2008

Special section: natural language processing

Morpheme based Language Model for Tamil Part–of–Speech Tagging

S. Lakshmana Pandian and T. V. Geetha

Department of Computer Science and Engineering, Anna University, Chennai, India. (lpandian72@yahoo.com).

Manuscript received May 12, 2008.
Manuscript accepted for publication October 25, 2008.

Abstract

The paper describes a Tamil Part of Speech (POS) tagging using a corpus–based approach by formulating a Language Model using morpheme components of words. Rule based tagging, Markov model taggers, Hidden Markov Model taggers and transformation–based learning tagger are some of the methods available for part of speech tagging. In this paper, we present a language model based on the information of the stem type, last morpheme, and previous to the last morpheme part of the word for categorizing its part of speech. For estimating the contribution factors of the model, we follow generalized iterative scaling technique. Presented model has the overall F–measure of 96%.

Key words: Bayesian learning, language model, morpheme components, generalized iterative scaling.

DESCARGAR ARTÍCULO EN FORMATO PDF

REFERENCES

[1] Aniket Dalal, Kumar Nagaraj, Uma Sawant, Sandeep Shelke , Hindi Part–of–Speech Tagging and Chunking : A Maximum Entropy Approach. In: Proceedings of the NLPAI Machine Learning Contest 2006 NLPAI, 2006. [ Links ]

[2] Nizar Habash , Owen Rambow ,Arabic Tokenization, Part–of–Speech Tagging and Morphological Disambiguation in one Fell Swoop. In: Proceedings of the 43rd Annual Meeting of the ACL, pages 573–580, Association for Computational Linguistics, June 2005. [ Links ]

[3] D. Hiemstra. Using language models for information retrieval. PhD Thesis, University of Twente, 2001. [ Links ]

[4] S. Armstrong, G. Robert, and P. Bouillon. Building a Language Model for POS Tagging (unpublished), 1996. http://citeseer.ist.psu.edu/armstrong96building.html [ Links ]

[5] P. Anandan, K. Saravanan, Ranjani Parthasarathi and T. V. Geetha. Morphological Analyzer for Tamil. In: International Conference on Natural language Processing, 2002. [ Links ]

[6] Thomas Lehman. A grammar of modern Tamil, Pondicherry Institute of Linguistic and culture. [ Links ]

[7] Sandipan Dandapat, Sudeshna Sarkar and Anupam Basu. A Hybrid Model for Part–of–speech tagging and its application to Bengali. In: Transaction on Engineering, Computing and Technology VI December 2004. [ Links ]

[8] Barbara B. Greene and Gerald M. Rubin. Automated grammatical tagger of English. Department of Linguistics, Brown University, 1971. [ Links ]

[9] S. Klein and R. Simmons. A computational approach to grammatical coding of English words. JACM, 10:334–337, 1963. [ Links ]

[10] Theologos Athanaselies, Stelios Bakamidis and Ioannis Dologlou. Word reordering based on Statistical Language Model. In: Transaction Engineering, Computing and Technology, v. 12, March 2006. [ Links ]

[11] Sankaran Baskaran. Hindi POS tagging and Chunking. In: Proceedings of the NLPAI Machine Learning Contest, 2006. [ Links ]

[12] Lluís Márquez and Lluis Padró. A flexible pos tagger using an automatically acquired Language model. In: Proceedings of ACL/EACL'97. [ Links ]

[13] K. Rajan. Corpus analysis and tagging for Tamil. In: Proceeding of symposium on Translation support system STRANS–2002 [ Links ]

[14] T. Brants. TnT – A Statistical Part–of–Speech Tagger. User manual, 2000. [ Links ]

[15] Scott M. Thede and Mary P. Harper. A second–order Hidden Markov Model for part–of–speech tagging. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 175—182. Association for Computational Linguistics, June 20––26, 1999. [ Links ]

[16] Eric Brill. Transformation–Based Error–Driven Learning and Natural Language Processing: A Case Study in Part–of–Speech Tagging. Computation Linguistics, 21(4):543– 565, 1995. [ Links ]