SciELO - Scientific Electronic Library Online

 
 issue47N-gram Parsing for Jointly Training a Discriminative Constituency ParserExploration on Effectiveness and Efficiency of Similar Sentence Matching author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Polibits  n.47 México Jan./Jul. 2013

 

Automatic WordNet Construction Using Markov Chain Monte Carlo

 

Marzieh Fadaee, Hamidreza Ghader, Heshaam Faili, and Azadeh Shakery

 

All authors are with the School of ECE, College of Engineering, University of Tehran, Tehran, Iran; Heshaam Faili and Azadeh Shakery are also with the School of Computer Science, Institute for Research in Fundamental Science (IPM), P.O. Box 19395-5746, Tehran, Iran (e-mail: m.fadaee@ut.ac.ir, h.ghader@ut.ac.ir, hfaili@ut.ac.ir, shakery@ut.ac.ir).

 

Manuscript received on December 7, 2012
Accepted for publication on January 11, 2013.

 

Abstract

WordNet is used extensively as a major lexical resource in information retrieval tasks. However, the qualities of existing Persian WordNets are far from perfect. They are either constructed manually which limits the coverage of Persian words, or automatically which results in unsatisfactory precision. This paper presents a fully-automated approach for constructing a Persian WordNet: A Bayesian Model with Markov chain Monte Carlo (MCMC) estimation. We model the problem of constructing a Persian WordNet by estimating the probability of assigning senses (synsets) to Persian words. By applying MCMC techniques in estimating these probabilities, we integrate prior knowledge in the estimation and use the expected value of generated samples to give the final estimates. This ensures great performance improvement comparing with Maximum-Likelihood and Expectation-Maximization methods. Our acquired WordNet has a precision of 90.46% which is a considerable improvement in comparison with automatically-built WordNets in Persian.

Key words: Semantic network, WordNet, ontology, Bayesian inference, Markov chain Monte Carlo, Persian.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

ACKNOWLEDGMENTS

We want to acknowledge the support of Research Institute for ICT. This research was in part supported by a grant from IPM (No. CS1391-4-19).

 

REFERENCES

[1] G. A. Miller, "WordNet: A lexical database for English," Commun.ACM, vol. 38, pp. 39-41, November 1995. [Online], Available: http://doi.acm.org/10.1145/219717.219748        [ Links ]

[2] R. Navigli and S. P. Ponzetto, "BabelNet: Building a very largemultilingual semantic network," in Proceedings of the 48th AnnualMeeting of the Association for Computational Linguistics, Uppsala.Sweden, 2010, pp. 216-225.         [ Links ]

[3] M. Montazery and H. Faili, "Automatic Persian WordNet construction,"in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, ser. COLINO '10. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010, pp. 846-850. [Online]. Available: http://dl.acm.org/citation.cfm?id=1944566.1944663         [ Links ]

[4] ---------- "Unsupervised learning for Persian WordNet construction," in RANLP, G. Angelova, K. Bontcheva, R. Mitkov, and N. Nicolov, Eds. RANLP 2011 Organising Committee, 2011, pp. 302-308.         [ Links ]

[5] M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S. Bagherbeigi, E. Fekri, M. Monshizadeh, and M. Assi, "Semi automatic development of FarsNet, the Persian WordNet," in 5th Global WordNet Conference (GWA2010), Mumbai, India, 2010.         [ Links ]

[6] P. Vossen, Ed., EuroWordNet: A multilingual database with lexical semantic networks. Norwell, MA, USA: Kluwer Academic Publishers, 1998.         [ Links ]

[7] B. Sagot and D. Fišer, "Building a free French WordNet from multilingual resources," in OntoLex 2008, Marrackech, Morocco, 2008.         [ Links ]

[8] S. Stamou, K. Ofiazer, K. Pala, D. Chnstoudoulakts, D. Cnstea, D. Tunis, S. Koeva, G. Totkov, D. Dutoit, and M. Grigoriadou, "Balkanet: A multilingual semantic network for the balkan languages," in Proceedings of the 1st Global WordNet Association conference, 2002.         [ Links ]

[9] O. Bilgin, Ö. Ç. Glu, and K. Oflazer, "Building a Wordnet for Turkish," pp. 163-172, 2004.         [ Links ]

[10] C. Lee, G. Lee, S. JungYun, and G. Leer, "Automatic WordNet mapping using word sense disambiguation," in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC), 2000.         [ Links ]

[11] P. Sathapornrungkij, "Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries," English, pp. 87-92, 2005.         [ Links ]

[12] R. V. Krejcie and D. W. Morgan, "Determining sample size for research activities," Educational and Psychological Measurement, vol. 30, no. 3, pp. 607-610, 1970. [Online]. Available: http://eric.ed.gov/ERICWebPortal/recordDetail?accno=EJ026025        [ Links ]

[13] W. Black, S. Elkateb, A. Pease, H. Rodriguez, and M. Alkhahfa, "Introducing the Arabic WordNet project," Word Journal Of The International Linguistic Association, 1998.         [ Links ]

[14] H. Rodriguez, D. Farwell, I. Farreres, M. Bertrán, M. Alkhalifa, and A. Marti, Arabic WordNet: Semi-automatic Extensions using Bayesian Inference. European Language Resources Association (ELRA), 2008, pp. 1-3. [Online]. Available: http://www.lrec-conf.org/proceedings/lrec2008/        [ Links ]

[15] A. Famian, "Towards Building a WordNet for Persian Adjectives," International Journal of lexicography, no. 2000, pp. 307-308, 2006.         [ Links ]

[16] M. Rouhizadeh, M. Shamsfard, and M. Yarmohammadi, "Building a WordNet for Persian verbs," in the Proceedings of the Fourth Global WordNet Conference (GWC '08). The Fourth Global WordNet Conference, 2008, pp. 406-412.         [ Links ]

[17] F. Keyvan, H. Borjian, M. Kasheff, and C. Fellbaum, "Developing PersiaNet: The Persian WordNet," in 3rd Global wordnet conference. Citeseer, 2007, pp. 315-318. [Online], Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.148.7473 \&rep=rep1 \&type=pdf        [ Links ]

[18] M. Shamsfard, "Towards semi automatic construction of a lexical ontology for Persian," in Proceedings o f the Sixth International Conference on Language Resources and Evaluation (LREC '08). Marrakech, Morocco: European Language Resources Association (ELRA), may 2008, http://www.lrec-conf.org/proceedings/lrec2008/.         [ Links ]

[19] S. Goldwater and T. Griffiths, "A fully Bayesian approach to unsupervised part-of-speech tagging," in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Repubhc: Association for Computational Linguistics, Jun. 2007, pp.744-751.         [ Links ]

[20] M. Johnson, T. Griffiths, and S. Goldwater, "Bayesian inference for PCFGs via Markov chain Monte Carlo," in Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. Rochester, New York: Association for Computational Linguistics, April 2007, pp. 139-146. [Online], Available: http://www.aclweb.org/anthology-new/N/N07/N07-1018.bib         [ Links ]

[21] S. Geman and D. Geman, "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images," in Readings in computer vision: issues, problems, principles, and paradigms, M. A. Fischler and O. Firschein, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1987, pp. 564-584. [Online]. Available: http://dl.acm.org/citation.cfm?id=33517.33564        [ Links ]

[22] P. Resnik and E. Hardisty, "Gibbs sampling for the uninitiated," University of Maryland, Tech. Rep., Oct. 2009.         [ Links ]

[23] S. Brody and M. Lapata, "Bayesian word sense induction," in Proceedings of the 12th Conference o f the European Chapter of the Association for Computational Linguistics, ser. EACL '09. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009, pp. 103-111. [Online]. Available: http://dl.acm.org/citation.cfm?id=1609067.1609078        [ Links ]

[24] M. BijanKhan, "The role of the corpus in writing a grammar: An introduction to a software," Iranian Journal of Linguistics, vol. 19, 2004.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License