SciELO - Scientific Electronic Library Online

vol.16 issue3Speckle Noise Reduction in Ultrasound Imaging using the Key Points in Low Degree Unbiased FIR FiltersA Divide-and-Conquer Approach to Commercial Territory Design author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.16 n.3 México Jul./Sep. 2012


Artículos regulares


Supervised Learning Algorithms Evaluation on Recognizing Semantic Types of Spanish Verb-Noun Collocations


Evaluación de algoritmos de aprendizaje supervisado para reconocimiento de las clases semánticas de colocaciones verbo-sustantivo en español


Alexander Gelbukh and Olga Kolesnikova


1 Centro de Investigación en Computación, Instituto Politécnico Nacional, México DF, México,


Article received on 07/02/2011;
accepted on 27/09/2011. 



The meaning of such verb-noun collocations as the wind blows, time flies, the day passes by can be generalized as 'what is designated by the noun exists'. Likewise, the meaning of make a decision, provide support, write a letter can be generalized as 'make what is designated by the noun'. These generalizations represent the meaning of certain groups of collocations and may be used as semantic annotation. Our objective is to evaluate the performance of some existing supervised machine learning methods on the task of annotating Spanish collocations with generalized meanings, some of which are exemplified above. The experimental results have demonstrated that supervised learning methods achieve significant accuracy allowing them to be used in high quality semantic annotation.

Keywords. Collocations, semantic annotation, supervised machine learning.



El significado de colocaciones de tipo verbo-sustantivo tales como the wind blows, el viento sopla, time flies, el tiempo vuela, the day passes, el día pasa, se puede generalizar y presentar con el patrón 'existe lo que indica el sustantivo'. Análogamente, el significado de make a decisión, tomar la decisión, provide support, proporcionar apoyo, write a letter, escribir una carta, se puede generalizar como 'hacer lo que señala el sustantivo'. Estas generalizaciones representan el significado de ciertos grupos de colocaciones y se pueden utilizar como anotación semántica. Nuestro objetivo es evaluar los algoritmos de aprendizaje de máquina supervisado para etiquetar colocaciones de tipo verbo-sustantivo en español con la propuesta anotación semántica. Los resultados obtenidos muestran que los métodos utilizados logran una precisión alta y se pueden usar para etiquetar colocaciones con la información semántica representada por el significado generalizado.

Palabras clave: Colocaciones, anotación semántica, aprendizaje de máquina supervisado.





We are grateful to Adam Kilgarriff and Vojtéch Kovár for providing us a list of verb-noun pairs from the Spanish Web Corpus of the Sketch Engine,

The work was done under partial support of Mexican Government: SNI, COFAA-IPN, PIFI-IPN, CONACYT grant 50206-H, and SIP-IPN grant 20100773.

A shorter versión of the paper has already appeared in MICAI-2010.



1. Aha, D.W., Kibler, D., & Albert, M.C. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66.         [ Links ]

2. Castro-Sánchez, N.A. & Sidorov, G. (2010). Analysis of Definitions of Verbs in an Explanatory Dictionary for Automatic Extraction of Actants Based on Detection of Patterns. Natural Language Processing and Information Systems. Lecture Notes in Computer Science, 6177, 233-239.         [ Links ]

3. Cortes, C. & Vapnic, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297.         [ Links ]

4. Escalante, H.J., Montes, M., & Sucar, L.E. (2009). Particle swarm model selection. Journal of Machine Learning Research, 10, 405-440.         [ Links ]

5. Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the Naive Bayes Model for Text Categorization. Ninth International Workshop on Artificial Intelligence and Statistics, Key West, Florida, USA, 332-339.         [ Links ]

6. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, Mass.: MIT Press.         [ Links ]

7. Gama, J. (2004). Functional Trees. Machine Learning, 55(3), 219-250.         [ Links ]

8. Halliday, M. A. K. (1961). Categories of the Theory of Grammar. Word, 17(3), 241-292.         [ Links ]

9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10-18.         [ Links ]

10.Jiang, L., Cai, Z., & Wang, D. (2010). Improving naive Bayes for classification. International Journal of Computers and Applications, 32(3), 328-332.         [ Links ]

11. Kilgarriff, A., Rychly, P., Smrz, P. & Tugwell, D. (2004). The Sketch Engine. 11th EURALEX International Congress, 105-116.         [ Links ]

12. Levin, B. (1993). English Verb Classes and Alternation: A Preliminary Investigation. Chicago: University of Chicago Press.         [ Links ]

13. Lewis, D.D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Nédellec, C. & Rouveirol, C. (Eds.), 10™ European Conference on Machine Learning, 1998, 4-15.         [ Links ]

14. Longman Corpus Network (1995). Longman Dictionary of Contemporary English (3rd Edition). Harlow, Essex, England: Longman Group Ltd.         [ Links ]

15. Mel'cuk, I. A. (1974). Opyt teorii lingvisticeskix modelej ''Smysl Tekst'' [Towards a Theory of Meaning-Text Linguistic Models, in Russian. Moscow: Nauka.         [ Links ]

16. Mel'cuk, I.A. (1995). Phrasemes in Language and Phraseology in Linguistics. In Everaert, M., van der Linden, E.J., Schenk, A. & Schreuder, R. (Eds.), Idioms: Structural and Psychological Perspectives, 167-232. Hillsdale, NJ: Lea Lawrence Erlbaum.         [ Links ]

17. Mel'cuk, I .A. (1996). Lexical Functions: A Tool for the Description of Lexical Relations in a Lexicon. In Wanner, L. (Ed.), Lexical Functions in Lexicography and Natural Language Processing, 37-102. Amsterdam, Philadelphia: John Benjamins Academic Publishing.         [ Links ]

18. Merriam-Webster Open Dictionary.        [ Links ]

19. Mitchell, T.M. (1997). Machine Learning. New York: McGraw Hill.         [ Links ]

20. Pedersen, T. (2000). A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation 1st North American Chapter of the Association for Computational Linguistics Conference (NAACL 2000), Seattle, WA, USA, 63-69.         [ Links ]

21.Pranckeviciene, E., Somorjai, R. & Tran, M.N. (2007). Feature/model selection by the linear programming combined with state-of-art classifiers: What can we learn about the data. International Joint Conference on Neural Networks (IJCNN 2007), Orlando, Florida, USA, 1627-1632.         [ Links ]

22.Sidorov, G. (1996). Lemmatization in automatized system for compilation of personal style dictionaries of literature writers. Word of Dostoyevsky (266-300). Moscow, Russia: Russian Academy of Sciences.         [ Links ]

23.Spanish Web Corpus.        [ Links ]

24. The University of Waikato Computer Science Department Machine Learning Group. WEKA download at        [ Links ]

25.Vossen, P. (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers.         [ Links ]

26. Wanner, L. (2004). Towards automatic fine-grained classification of verb-noun collocations. Natural Language Engineering, 10(2), 95-143.         [ Links ]

27. Wanner, L, Bohnet, B. & Giereth, M. (2006). What is beyond Collocations? Insights from Machine Learning Experiments. 12th EURALEX International Congress, Turin, Italy, 1071-1084.         [ Links ]

28.Witten, I. H. & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Amsterdam, Boston, MA: Morgan Kaufmann        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License