Clause Boundary Identification using Classifier and Clause Markers in Urdu Language

Parveen, Daraksha; Sanyal, Ratna; Ansari, Afreen

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Polibits

On-line version ISSN 1870-9044

Polibits n.43 México Jan./Jun. 2011

Clause Boundary Identification using Classifier and Clause Markers in Urdu Language

Daraksha Parveen*, Ratna Sanyal**, and Afreen Ansari***

Indian Institute of Information Technology –Allahabad, India (e–mail: *daraksha.parveen3022@gmail.com; **rsanyal@iiita.ac.in; ***afreen.aa@gmail.com).

Manuscript received November 2, 2010.
Manuscript accepted for publication January 12, 2011.

Abstract

This paper presents the identification of clause boundary for the Urdu language. We have used Conditional Random Field as the classification method and the clause markers. The clause markers play the role to detect the type of subordinate clause, which is with or within the main clause. If there is any misclassification after testing with different sentences then more rules are identified to get high recall and precision. Obtained results show that this approach efficiently determines the type of sub–ordinate clause and its boundary.

Key words: Clause marker, conditional random field.

DESCARGAR ARTÍCULO EN FORMATO PDF

REFERENCES

[1] M. Butt, T.H. King, and S. Roth, "Urdu correlatives: theoretical and implementational issues," in Proceedings of the LFG07 Conference, CSLI publication, 2007, pp. 107–127. [ Links ]

[2] E. Ejerhed, "Finding Clauses in Unrestricted Text by Finitary and Stochastic Methods," in Proceedings of the 2nd Conference on Applied Natural Language Processing, Austin Texas, 1988, pp. 219–227. [ Links ]

[3] H. Fujisaki, K. Hirose, H. Kawai, and Y. Asano, "A System for synthesizing Japanese speech from orthographic text," in Proc. of International Conference on Acoustics, Speech, and Signal Processing 1CASSP–90, vol.l, 1990, pp. 617–620. [ Links ]

[4] A. Ghosh, A. Das, and S. Bandyopadhyay, "Clause Identification and Classification in Bengali," in Proceedings of the Ist Workshop on South and Southeast Asian Natural Language Processing (WSSANLP, 23rd International Conference on Computational Linguistics (COLINO), Beijing, August 2010, pp. 17–25. [ Links ]

[5] V. P. Harris, "Clause Recognition in the Framework of Alignment," Mitkov, R., Nicolov, N. (eds.) Recent Advances in Natural Language Processing, John Benjamins Publishing Company, Amsterdam/Philadelphia, 1997, pp. 417–425. [ Links ]

[6] D. Kelly, J. McDonald, and C. Markham, "Evaluation of threshold model HMMS and Conditional Random Fields for recognition of spatiotemporal gestares in sign language," in Proceedings of the 12th international conference Computer Vision Workshops (ICCV Workshops 2009), 2009, pp. 490–497. [ Links ]

[7] S. Kim, S. Park, S. Lee, and K. Kim, "A Feature Space Expression to Analyze Dependency of Korean Clauses with a Composite Kernel," in Proceedings of the 6th International Conference Advanced Language Processing and Web Information Technology (ALFIL 2007), 2007, pp. 57–62. [ Links ]

[8] J.D. Lafferty, A. McCallum, and F.C.N. Pereira, "Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data," in ICML '03 Proceedings of the Eighteenth International Conference on Machine Learning, 2003, pp. 282–289. [ Links ]

[9] V. Nguyen, "Using Conditional Random Fields for Clause Splitting," in Proceedings of the Pacific Association for Computational Linguistics, University of Melbourne Australia, 2007. [ Links ]

[10] V.D. Pradeep, M. Rakesh, and R. Sanyal, "HMM–based Language independent POS tagger," in Third Indian International conference on Artificial Intelligence IICAI2007, 2007. [ Links ]

[11] E.F.T.K Sang and D. Herve, "Introduction to CoNLL–2001 shared task: clause identification," in Walter Daelemans and Remi Zajac (eds.) Proceedings of Conference on Computational Natural Language (CoNLL 2001), Toulouse, France, 2001, pp. 53–57. [ Links ]

[12] F. Sha and F. Pereira, "Shallow Parsing with Conditional Random Fields," in NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Volume 1, pp. 134–141, 2003. [ Links ]

[13] R.S.R. Vijay and L.D. Sobha, "Clause Boundary Identification Using Conditional Random Fields," in Lecture Notes in Computer Science, Proceedings of the 9th international conference on Computational linguistics and intelligent text processing, Springer–Verlag, 2008, pp. 140–150. [ Links ]

[14] J.L Vilson, "Clause Processing in Complex Sentences," in Proceedings of the First International Conference on Language Resource and Evaluation, vol. 1, 1998, pp. 937–943. [ Links ]