SciELO - Scientific Electronic Library Online

 
 número46Redes de palabras alineadas como recurso en la extracción de equivalencias léxicas de traducción y su aplicación en la alineaciónA Hybrid Approach for Event Extraction índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Polibits

versión On-line ISSN 1870-9044

Polibits  no.46 México jul./dic. 2012

 

Lexical Disambiguation of Arabic Language: An Experimental Study

 

Laroussi Merhben, Anis Zouaghi, and Mounir Zrigui

 

Unité de Recherche en Technologies de l'Information et de la Communication of the Réseau National Universitaire Tunisien, Tunisia (e–mail: aroussi_merhben@hotmail.com; Anis.Zouaghi@gmail.com; mounir.zrigui@fsm.rnu.tn).

 

Manuscript received June 18, 2012.
Manuscript accepted for publication July 24, 2012.

 

Abstract

In this paper we test some supervised algorithms that most of the existing related works of word sense disambiguation have cited. Due to the lack of linguistic data for the Arabic language, we work on non–annotated corpus and with the help of four annotators; we were able to annotate the different samples containing the ambiguous words. Since that, we test the Naïve Bayes algorithm, the decision lists and the exemplar based algorithm. During the experimental study, we test the influence of the window size on the disambiguation quality, the derivation and the technique of smoothing for the (2n+1)–grams. For these tests the exemplar based algorithm achieves the best rate of precision.

Key words: Supervised algorithms, training data, Naïve Bayes, decision list, exemplar based algorithm, window size.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCES

[1] R. Mihalcea, "Word Sense Disambiguation Using Pattern Learning and Automatic Feature Selection", in Journal of Natural Language and Engineering (JNLE), December 2002, p.p: 348–358.         [ Links ]

[2] H. T. Ng and H. B. Lee, "Integrating multiple knowledge sources to disambiguate word senses: An examplar–based approach". In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, 1996, p.p: 40–47.         [ Links ]

[3] L. Al–Sulaiti, E. Atwell, "The design of a corpus of contemporary Arabic". International Journal of Corpus Linguistics, vol. 11, 2006, pp. 135–171.         [ Links ]

[4] M. Ben Mukarram and al–Ifriqi al–Misri ibn MANZUR, " Lisàn al'arab", Ibn Manzûr, 15 volumes, 1956, Beyrout.         [ Links ]

[5] J. Savoy, Y. Rasolofo, "Report on the TREC–11 Experiment: Arabic, Named Page and Topic Distillation Searches". Eleventh Text Retrival Conference TREC, 2002.         [ Links ]

[6] C. Fox, "A stop list for general text". SIGIR Forum, 1990, Vol. 24, No. 1–2, pp. 19–35.         [ Links ]

[7] A. Chen, F. Gey, translation Term Weighting and Combining Translation Resources in Cross–Language retrieval, Tenth text retrieval conference, 2001, TREC.         [ Links ]

[8] S. Gerard, M.J. McGill, "Introduction to modern information retrieval", ISBN: 0070544840, 1983, p.p: 448.         [ Links ]

[9] K. Shereen and G. Roland, "Stemming Arabic text", Computer Science Department, Lancaster University, Lancaster, UK, 1999.         [ Links ]

[10] R. Navigili, "Word Sense Disambiguation: A Survey". ACM Computing Surveys, Vol. 41, No. 2, Article 10, Publication date: February 2009.         [ Links ]

[11] T. Pedersen, "Learning probabilistic models of word sense disambiguation", Ph.D. dissertation. Southern Methodist University, Dallas, TX. 1998.         [ Links ]

[12] D. Yarowsky, "Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French". In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM), 1994, p.p: 88–95.         [ Links ]

[13] A. Zouaghi, L. Merhbene, M. Zrigui, "Word Sense disambiguation for Arabic language using the variants of the Lesk algorithm", WORLDCOMP'11, Las Vegas, juillet 2011, p.p. 561–567.         [ Links ]

[14] D. Yarowsky, "One sense per collocation". In Proceedings of the ARPA Workshop on Human Language Technology, Princeton,1993, pp. 266–7.         [ Links ]

[15] M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora". Proceedings of the ACL40th Meeting of the Association for Computational Linguistics, Philadelphia, U.S.A. 2002, pp. 255–262.         [ Links ]

[16] S. Elmougy, H. Taher and H. Noaman "Naïve Bayes Classifier for Arabic Word Sense Disambiguation". In proceeding of the 6th International Conference on Informatics and Systems, 2008, pp: 16–21.         [ Links ]

[17] M. Soha Eid, et al., "Comparative Study of Rocchio Classifier Applied to supervised WSD Using Arabic Lexical Samples". Proceedings of the tenth conference of language engeneering (SEOLEC'2010), Cairo, Egypt, December 15–16, 2010.         [ Links ]

[18] C. Leacock, G. Towell and E. Voorhees, "Corpus based statistical sense resolution". In Proceedings of the ARPA Workshop on Human Language Technology, 1993, p.p. 260–265.         [ Links ] 

[19] R.J. Mooney, "Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. Proceedings of EMNLP, 1996, p.p: 82–91.         [ Links ]

[20] T. Pedersen, "Learning Probabilistic Models of Word Sense Disambiguation". Ph.D. Dissertation. Southern Methodist University, 1998.         [ Links ]

[21] A. Zouaghi, L. Merhbene and M. Zrigui, "Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation". Journal Article published in the Artificial Intelligence, Online First, 30 May 2011, Review; DOI: 10.1007/s10462–011–9249–3.         [ Links ]

[22] L. Merhbene, A. Zouaghi and M. Zrigui, Ambiguous Arabic Words Disambiguation. In Proceeding of The 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD'10), The University of Greenwich, London, United Kingdom, 9–11 June, 2010, p.p. 157–164.         [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons