Supervised Learning Algorithms Evaluation on Recognizing Semantic Types of Spanish Verb-Noun Collocations

Gelbukh, Alexander; Kolesnikova, Olga

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.16 no.3 Ciudad de México jul./sep. 2012

Artículos regulares

Supervised Learning Algorithms Evaluation on Recognizing Semantic Types of Spanish Verb-Noun Collocations

Evaluación de algoritmos de aprendizaje supervisado para reconocimiento de las clases semánticas de colocaciones verbo-sustantivo en español

Alexander Gelbukh and Olga Kolesnikova

¹ Centro de Investigación en Computación, Instituto Politécnico Nacional, México DF, México gelbukh@gelbukh.com, kolesolga@gmail.com

Article received on 07/02/2011;
accepted on 27/09/2011.

Abstract

The meaning of such verb-noun collocations as the wind blows, time flies, the day passes by can be generalized as 'what is designated by the noun exists'. Likewise, the meaning of make a decision, provide support, write a letter can be generalized as 'make what is designated by the noun'. These generalizations represent the meaning of certain groups of collocations and may be used as semantic annotation. Our objective is to evaluate the performance of some existing supervised machine learning methods on the task of annotating Spanish collocations with generalized meanings, some of which are exemplified above. The experimental results have demonstrated that supervised learning methods achieve significant accuracy allowing them to be used in high quality semantic annotation.

Keywords. Collocations, semantic annotation, supervised machine learning.

Resumen

El significado de colocaciones de tipo verbo-sustantivo tales como the wind blows, el viento sopla, time flies, el tiempo vuela, the day passes, el día pasa, se puede generalizar y presentar con el patrón 'existe lo que indica el sustantivo'. Análogamente, el significado de make a decisión, tomar la decisión, provide support, proporcionar apoyo, write a letter, escribir una carta, se puede generalizar como 'hacer lo que señala el sustantivo'. Estas generalizaciones representan el significado de ciertos grupos de colocaciones y se pueden utilizar como anotación semántica. Nuestro objetivo es evaluar los algoritmos de aprendizaje de máquina supervisado para etiquetar colocaciones de tipo verbo-sustantivo en español con la propuesta anotación semántica. Los resultados obtenidos muestran que los métodos utilizados logran una precisión alta y se pueden usar para etiquetar colocaciones con la información semántica representada por el significado generalizado.

Palabras clave: Colocaciones, anotación semántica, aprendizaje de máquina supervisado.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgements

We are grateful to Adam Kilgarriff and Vojtéch Kovár for providing us a list of verb-noun pairs from the Spanish Web Corpus of the Sketch Engine, www.sketchengine.co.uk.

The work was done under partial support of Mexican Government: SNI, COFAA-IPN, PIFI-IPN, CONACYT grant 50206-H, and SIP-IPN grant 20100773.

A shorter versión of the paper has already appeared in MICAI-2010.

References

1. Aha, D.W., Kibler, D., & Albert, M.C. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66. [ Links ]

2. Castro-Sánchez, N.A. & Sidorov, G. (2010). Analysis of Definitions of Verbs in an Explanatory Dictionary for Automatic Extraction of Actants Based on Detection of Patterns. Natural Language Processing and Information Systems. Lecture Notes in Computer Science, 6177, 233-239. [ Links ]

3. Cortes, C. & Vapnic, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297. [ Links ]

4. Escalante, H.J., Montes, M., & Sucar, L.E. (2009). Particle swarm model selection. Journal of Machine Learning Research, 10, 405-440. [ Links ]

5. Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the Naive Bayes Model for Text Categorization. Ninth International Workshop on Artificial Intelligence and Statistics, Key West, Florida, USA, 332-339. [ Links ]

6. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, Mass.: MIT Press. [ Links ]

7. Gama, J. (2004). Functional Trees. Machine Learning, 55(3), 219-250. [ Links ]

8. Halliday, M. A. K. (1961). Categories of the Theory of Grammar. Word, 17(3), 241-292. [ Links ]

9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10-18. [ Links ]

10.Jiang, L., Cai, Z., & Wang, D. (2010). Improving naive Bayes for classification. International Journal of Computers and Applications, 32(3), 328-332. [ Links ]

11. Kilgarriff, A., Rychly, P., Smrz, P. & Tugwell, D. (2004). The Sketch Engine. 11^th EURALEX International Congress, 105-116. [ Links ]

12. Levin, B. (1993). English Verb Classes and Alternation: A Preliminary Investigation. Chicago: University of Chicago Press. [ Links ]

13. Lewis, D.D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Nédellec, C. & Rouveirol, C. (Eds.), 10™ European Conference on Machine Learning, 1998, 4-15. [ Links ]

14. Longman Corpus Network (1995). Longman Dictionary of Contemporary English (3^rd Edition). Harlow, Essex, England: Longman Group Ltd. [ Links ]

15. Mel'cuk, I. A. (1974). Opyt teorii lingvisticeskix modelej ''Smysl Tekst'' [Towards a Theory of Meaning-Text Linguistic Models, in Russian. Moscow: Nauka. [ Links ]

16. Mel'cuk, I.A. (1995). Phrasemes in Language and Phraseology in Linguistics. In Everaert, M., van der Linden, E.J., Schenk, A. & Schreuder, R. (Eds.), Idioms: Structural and Psychological Perspectives, 167-232. Hillsdale, NJ: Lea Lawrence Erlbaum. [ Links ]

17. Mel'cuk, I .A. (1996). Lexical Functions: A Tool for the Description of Lexical Relations in a Lexicon. In Wanner, L. (Ed.), Lexical Functions in Lexicography and Natural Language Processing, 37-102. Amsterdam, Philadelphia: John Benjamins Academic Publishing. [ Links ]

18. Merriam-Webster Open Dictionary. http://www3.merriam-webster.com/opendictionary/ [ Links ]

19. Mitchell, T.M. (1997). Machine Learning. New York: McGraw Hill. [ Links ]

20. Pedersen, T. (2000). A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation 1^st North American Chapter of the Association for Computational Linguistics Conference (NAACL 2000), Seattle, WA, USA, 63-69. [ Links ]

21.Pranckeviciene, E., Somorjai, R. & Tran, M.N. (2007). Feature/model selection by the linear programming combined with state-of-art classifiers: What can we learn about the data. International Joint Conference on Neural Networks (IJCNN 2007), Orlando, Florida, USA, 1627-1632. [ Links ]

22.Sidorov, G. (1996). Lemmatization in automatized system for compilation of personal style dictionaries of literature writers. Word of Dostoyevsky (266-300). Moscow, Russia: Russian Academy of Sciences. [ Links ]

23.Spanish Web Corpus.http://tracsketchengine.co.uk/wiki/Corpora/SpanishWebCorpus/ [ Links ]

24. The University of Waikato Computer Science Department Machine Learning Group. WEKA download at http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html/ [ Links ]

25.Vossen, P. (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers. [ Links ]

26. Wanner, L. (2004). Towards automatic fine-grained classification of verb-noun collocations. Natural Language Engineering, 10(2), 95-143. [ Links ]

27. Wanner, L, Bohnet, B. & Giereth, M. (2006). What is beyond Collocations? Insights from Machine Learning Experiments. 12^th EURALEX International Congress, Turin, Italy, 1071-1084. [ Links ]

28.Witten, I. H. & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Amsterdam, Boston, MA: Morgan Kaufmann [ Links ]