Clasificación de roles semánticos usando características sintácticas, semánticas y contextuales

Reyes, José A.; Montes, Azucena; González, Juan G.; Pinto, David E.

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 Ciudad de México Apr./Jun. 2013

Artículos

Clasificación de roles semánticos usando características sintácticas, semánticas y contextuales

Classifying Case Relations using Syntactic, Semantic and Contextual Features

José A. Reyes¹, Azucena Montes², Juan G. González³ y David E. Pinto⁴

¹Centro Nacional de Investigación y Desarrollo Tecnológico, México alexreyes06c@cenidet.edu.mx

² Centro Nacional de Investigación y Desarrollo Tecnológico, México y Universidad Nacional Autónoma de México, México amr@cenidet.edu.mx, amontesr@iingen.unam.mx

³ Centro Nacional de Investigación y Desarrollo Tecnológico, México gabriel@cenidet.edu.mx

⁴ Benemérita Universidad Autónoma de Puebla, México dpinto@cs.buap.mx

Articulo recibido el 16/10/2012
Aceptado el 03/04/2013

Resumen

Este artículo presenta una clasificación de roles semánticos basada en características sintácticas, semánticas y contextuales. El objetivo de este artículo es identificar mediante la tarea de clasificación, el tipo de rol semántico existente entre un evento y sus actantes; por ello se presenta un análisis de características para seleccionar un subconjunto que mejore el desempeño de la tarea. Adicionalmente, se presenta una comparativa de cuatro algoritmos de clasificación: máquinas de soporte vectorial, los k-vecinos más cercanos, clasificador de Bayes y el clasificador basado en arboles de decisión C4.5, esto con la finalidad de analizar su desempeño con todas las características y con las relevantes en cada categoría de rol semántico. Con base en la experimentación, se obtiene que la selección de atributos mejora el desempeño de la tarea de clasificación, ya que con el grupo de características relevantes, se obtiene el mejor desempeño de 84.6% con el algoritmo basado en arboles de decisión C4.5. El resultado del etiquetado de roles puede ser utilizado para una representación de conocimiento o se puede utilizar para apoyar en la tarea de aprendizaje ontológico.

Palabras clave: Clasificación de roles semánticos, adquisición de conocimiento, procesamiento del lenguaje natural, aprendizaje máquina.

Abstract

This paper presents a classification of semantic roles using syntactic, semantic and contextual features. The aim of our work is to identify types of semantic roles involving events and their actors; therefore, we fulfill a feature analysis in order to select the best feature subset which improves the fulfillment of the task. In addition, we compare four classification algorithms: Support Vector Machine (SVM), k-nearest neighbor (k-NN), Bayes classifier and decision tree classifier C4.5. This comparison was made in order to analyze the performance of these algorithms with all features against relevant features for each semantic role category. In our experimentation, we obtain that feature selection improved the performance of algorithms in our classification task, since with relevant features we obtained the best performance of 84.6% with decision tree classifier C4.5. The results for the labeling task can be used for knowledge representation or ontology learning.

Keywords. Semantic roles classification, knowledge acquisition, natural language processing, machine learning.

DESCARGAR ARTÍCULO EN FORMATO PDF

Referencias

1. Tesnière, L. (1976). Éléments de syntaxe structurale (2e ed.). Paris: Klincksieck. [ Links ]

2. Halliday, M.A.K. (1994). An introduction to functional grammar (2nd ed.). London: Routledge. [ Links ]

3. Celli, F. (2010). UNITN: Part-Of-Speech counting in relation extraction. 5^th International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 198-201. [ Links ]

4. Tratz, S. & Hovy, E. (2010). ISI: automatic classification of relations between nominals using a maximum entropy classifier. 5th International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 222-225. [ Links ]

5. Szarvas, G. & Gurevych, I. (2010). TUD: semantic relatedness for relation classification. 5^th International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 210-213. [ Links ]

6. Pal, S., Pakray, P., Das, D., & Bandyopadhyay, S. (2010). JU: a supervised approach to identify semantic relations from paired nominals. 5^h International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 206-209. [ Links ]

7. Chen, Y., Lan, M., Su, J., Zhou, Z.M., & Xu, Y. (2010). ECNU: effective semantic relations classification without complicated features or multiple external corpora. 5^th International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 226-229. [ Links ]

8. Rosario, B. & Hearst, M.A. (2004). Classifying semantic relations in bioscience texts. 42^nd Annual Meeting of the Association for Computational Linguistics (ACL'04), Barcelona, Spain, 430-437. [ Links ]

9. Rink, B. & Harabagiu, S. (2010). UTD: classifying semantic relations by combining lexical and semantic resources. 5^th International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 256-259. [ Links ]

10. Negri, M. & Kouylekov, M. (2010). FBK NK: a wordNet-based system for multi-way classification of semantic relations. 5^th International Workshop on Semantic Evaluation (ACL 2010), Uppsala, Sweden, 202-205. [ Links ]

11. Chambers, N., Wang, S., & Jurafsky, D. (2007). Classifying Temporal Relations between Events. 45^th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL'07), Prague, Czech Republic, 173-176. [ Links ]

12. Kaneiwa, K., Iwazume, M., & Fukuda, K. (2007). An upper ontology for event classifications and relations. 20^th Australian joint conference on Advances in artificial intelligence, Gold Coast, Australia, 394-403. [ Links ]

13. Téllez, A. (2005). Extracción de Información con Algoritmos de Clasificación. Tesis de maestría, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, México. [ Links ]

14. Zhang, Z. (2004). Weakly-supervised relation classification for information extraction. Thirteenth ACM international conference on Information and knowledge management (CIKM'04), Washington, DC., 581-588. [ Links ]

15. Gildea, D. & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3), 245-288. [ Links ]

16. Xue, N. & Palmer, M. (2004). Calibrating features for semantic role labeling. 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 88-94. [ Links ]

17. Salton, G. Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. [ Links ]

18. Padró, L., Collado, M., Reese, S., Lloberes, M., & Castellón, I. (2010). FreeLing 2.1: Five Years of Open-Source Language Processing Tools. 7th International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, 931-936. [ Links ]

19. Hall, M.A. (1999). Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, The University of Waikato, Hamilton, New Zealand. [ Links ]

20. Liu, H. & Setiono, R. (1996). A probabilistic approach to feature selection - A filter solution. 13th International Conference on Machine Learning (ICML'96), Bari, Italy, 319-327. [ Links ]

21. Kohavi, R. & John, G.H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-324. [ Links ]

22. Kira, K. & Rendell, L.A. (1992). A practical approach to feature selection. Ninth International Workshop on Machine Learning (ML92), Aberdeen, Scotland, 249-256. [ Links ]

23. Tovar, M., Reyes, J.A., Montes, A., Vilariño, D., Pinto, D., & León, S. (2012). BUAP: A first approximation to relational similarity measuring. First Joint Conference on Lexical and Computational Semantics, Montreal, Canada, 502-505. [ Links ]

24. Polaka, I. (2011). Feature selection approaches in antibody display data analysis. 8th International Scientific and Practical Conference, vol. II, Rezekne, Latviapp. 16-23. [ Links ]

25. Alibeigi, M., Hashemi, S., & Hamzeh, A. (2011). Unsupervised feature selection based on the distribution of features attributed to imbalanced data sets. International Journal of Artificial Intelligence and Expert Systems, 2(1), 14-22. [ Links ]

26. Aha, D.W., Kibler, D., & Albert, M.K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66. [ Links ]

27. Chang, Ch. & Lin, Ch. (2001). LIBSVM - A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27. [ Links ]

28. John, G.H. & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Eleventh Conference on Uncertainty in Artificial Intelligence (UAI'95), Montreal, Canada, 338-345. [ Links ]

29. Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, San Mateo, Calif.: Morgan Kaufmann Publishers. [ Links ]

30. Kohavi, R. & Provost, F. (1998). Glossary of Terms, Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process. Machine Learning, 30(2-3). [ Links ]