SciELO - Scientific Electronic Library Online

 
vol.15 número2Evaluación de modelos de n-gramas para la tarea de desambiguación bilingüe del sentido de las palabrasReconocimiento de patrones para la identificación de estilos de aprendizaje en herramientas de educación móvil y en redes sociales índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Comp. y Sist. vol.15 no.2 Ciudad de México Out./Dez. 2011

 

Artículos

 

Document Level Emotion Tagging: Machine Learning and Resource Based Approach

 

Etiquetación de emociones a nivel de documento: aprendizaje automático y un método basado en recursos

 

Dipankar Das and Sivaji Bandyopadhyay

 

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India. E–mail: dipankar.dipnil2005@gmail.com, sivaji_cse_ju@yahoo.com

 

Article received on 11/15/2010.
Accepted 05/06/2011.

 

Abstract

The present task involves the identification of emotions from Bengali blog documents using two separate approaches. The first one is a machine learning approach that accumulates document level information from sentences obtained from word level granular detail whereas the second one is a resource based approach that considers the Bengali WordNet Affect, the word level Bengali affective lexical resource. In the first approach, the Support Vector Machine (SVM) classifier is employed to perform the word level classification. Sense weight based average scoring technique determines the sentential emotion scores based on the word level emotion tagged constituents. The cumulative summation of sentential emotion scores is assigned to each document considering the combinations of various heuristic features. The second one implements a majority based approach to classify a given document considering the Bengali WordNet Affect lists. Instead of assigning a single emotion tag to a document, in both approaches, the best two emotion tags are assigned to each document according to the ordered emotion scores obtained. By applying the best feature combination acquired from the development set, the evaluation of 110 test documents yields the average F–Scores of 59.50% and 51.07% for the two approaches respectively with respect to all emotion classes.

Keywords: Natural language processing, computational linguistics, text, blog, document, WordNet Affect, sense weight score, CRF, SVM, emotion tagging, heuristic features.

 

Resumen

El objetivo de este trabajo es identificar las emociones en documentos escritos en bengalí extraídos de un blog usando dos enfoques distintos. El primer enfoque es aprendizaje automático en el cual se acumula la información de los documentos a partir de las oraciones obtenidas a través de análisis de palabras, es decir, en el nivel más granular, mientras que el segundo enfoque está basado en recursos de los cuales usamos el Bengalí WordNet Affect —un recurso léxico que incluye palabras del bengalí etiquetadas con emociones. En el primer enfoque, la máquina de soporte vectorial (Support Vector Machine, SVM) se usa para la clasificación a nivel de palabras. El valor afectivo de las oraciones se calcula según la técnica basada en promediar los puntajes de pesos asignados a los significados de palabras etiquetadas con emociones en estas oraciones. La suma acumulada de los puntajes afectivos de las oraciones se asigna a cada documento tomando en cuenta diversas características heurísticas. El segundo enfoque implementa el método basado en mayoría para clasificar un documento dado considerando las listas del Bengalí WordNet Affect. En ambos enfoques, en vez de asignar una única etiqueta afectiva a un documento dado, las dos mejores etiquetas afectivas se asignan a cada documento según los puntajes afectivos obtenidos ordenados. Usando la combinación de las mejores características obtenida del conjunto de desarrollo, al evaluar 110 documentos de prueba resulta un valor promedio de la métrica F–score en los dos enfoques 59,50% y 51,07% respectivamente para toda clase de emociones.

Palabras clave: Procesamiento de lenguaje natural, lingüística computacional, texto, blog, documento, WordNet Affect, puntaje de peso de significado, campo aleatorio condicional (Conditional Random Field, CRF), máquina de soporte vectorial (Support Vector Machine, SVM), etiquetas afectivas, características heurísticas.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

1. Banea, C., Mihalcea, R. & Wiebe, J. (2008). A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. 6th International Conference on Language Resources and Evaluation. Marrakech, Morocco, 2764–2767.         [ Links ]

2. Cardie, C., Wiebe, J., Wilson, T. & Litman, D. (2003). Combining Low–Level and Summary Representations of Opinions for Multi–Perspective Question Answering. AAAI Spring Symposium on New Directions in Question Answering, Palo Alto, California, USA, 20–27.         [ Links ]

3. Chaumartin, F. (2007). Upar7: A knowledge–based system for headline sentiment tagging. 4th International Workshop on Semantic Evaluations (SemEval'07), Prague, Czech Republic, 422–425.         [ Links ]

4. Chesley, P., Vincent, B., Xu, L. & Srihari, R.K. (2006). Using verbs and adjectives to automatically classify blog sentiment. AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI–CAAW2006),Stanford, CA, USA, 27–29.         [ Links ]

5. Clore, G. L., Ortony, A. & Foss, M.A. (1987). The psychological foundations of the affective lexicon. Journal of Personality and Social Psychology, 53(4), 751–766.         [ Links ]

6. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychologica l Measurement, 20(1), 37–46.         [ Links ]

7. Cortes, C. & Vapnik, V. (1995). Support–Vector Network. Machine Learning, 20(3), 273–297.         [ Links ]

8. Das, D. & Bandyopadhyay, S. (2009). Emotion Tagging – A Comparative Study on Bengali and English Blogs. 7th International Conference on Natural Language Processing (ICON–2009), Hyderabad, India, 177–184.         [ Links ]

9. Das, D. & Bandyopadhyay, S. (2009).Word to Sentence Level Emotion Tagging for Bengali Blogs. ACL–IJCNLP 2009 Conference Short Papers, Suntec, Singapore, 149–152.         [ Links ]

10. Das, D. & Bandyopadhyay, S. (2010). Developing Bengali WordNet Affect for Analyzing Emotion. 23d International Conference on Computer Processing of Oriental Languages, California, USA, 35–40.         [ Links ]

11. Das, D. & Bandyopadhyay, S. (2010). Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level. 8th Worshop on Asian Language Resources (COLING–2010), Beijing, China, 47–55.         [ Links ]

12. Das, D. & Bandyopadhyay, S. (2010). Sentence Level Emotion Tagging on Blog and News Corpora. Journal of Intelligent System, 19(2), 145–162.         [ Links ]

13. Das, D. & Bandyopadhyay, S. (2010).Sentence to Document Level Emotion Tagging – A Coarsegrained Study on Bengali Blogs. 2nd Mexican Conference on Pattern Recognition: Advances in pattern recognition (MCPR'10), 332–341.         [ Links ]

14. Das, D., Kolya, A.K., Ekbal, A. & Bandyopadhyay, S. (2011). Temporal Analysis of Sentiment Events–A Visual Realization and Tracking. 12th International Conference on Intelligent Text Processing and Computational Linguistics, (CICLing–2011), A. Gelbukh (Ed.), LNCS 6608, 417–428, Tokyo, Japan.         [ Links ]

15. Denecke, K. (2008). Using SentiWordNet for multilingual sentiment analysis. IEEE 24thInternational Conference on Data Engineering Workshop, Cancún, México, 507–512.         [ Links ]

16. Ekman, P. (1993). Facial expression and emotion. The American Psychologist, 48(4), 384–392.         [ Links ]

17. Esuli, A. & Sebastiani, F. (2006). SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. 5th Conference on Language Resources and Evaluation, Genoa, Italy, 417–422.         [ Links ]

18. Grefenstette, G., Qu, Y., Shanahan, J.G. & Evans, D.A. (2004). Coupling niche browsers and affect analysis for an opinion mining application.7th international conference on Computer–Assisted Information Retrieval (Recherché d'Informationet ses Applications) RIAO 2004, Avignon, France,186–194.         [ Links ]

19. Katz, P., Singleton, M. & Wicentowski, R. (2007). Swat–mp: the semeval–2007 systems for task 5 and task 14, 4th International Workshop on Semantic Evaluations (SemEval '07), Prague, Czech Republic, 308–313.         [ Links ]

20. Kipper–Schuler, K. (2005).VerbNet: A broad–coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA.         [ Links ]

21. Ku, L.W., Liang, Y. T. & Chen, H.H. (2006). Opinion extraction, summarization and tracking in news and blog corpora. AAAI–2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, California, USA, 100–107.         [ Links ]

22. Lafferty, J.D., McCallum, A. & Pereira, F.C.N. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. 18th International Conference on Machine Learning. MA, USA, 282–289.         [ Links ]

23. Leopold, E. & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.         [ Links ]

24. Lin, K.H.Y., Yang, C. & Chen, H.H. (2007). What Emotions do News Articles Trigger in Their Readers? 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands, 733–734.         [ Links ]

25. Liu, B. (2009).The challenge is still the accuracy of sentiment prediction and solving the associated problems. Keynote talk in 5th Annual Text Analytics Summit, Boston, USA, June 1–2.         [ Links ]

26. Magnini, B. & Cavaglia, G. (2000). Integrating subject field codes into wordnet, Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, 1413–1418.         [ Links ]

27. Miller, G. A. (1995). WordNet: a lexical database for English, Communications of the ACM, 38 (11), 39–41.         [ Links ]

28. Mishne, G., de Rijke, M., Nicolov N., Salvetti F., Liberman M. & Martin J.H. (2006). Capturing Global Mood Levels using Blog Posts. AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI–CAAW 2006), California, USA, 145–152.         [ Links ]

29. Mishne, G. (2005). Experiments with Mood Classification in Blog Posts. 1st Workshop on Stylistic Analysis of Text for Information Access. SIGIR 2005, Salvador, Brazil, 53–60.         [ Links ]

30. Pang, B. & Lee, L. (2008). Opinion mining and Sentiment analysis. Foundations and Trends in Information Retrieval, 2(1 –2), 1 –135.         [ Links ]

31. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A comprehensive Grammar of the English Language (2nd Ed.). New York: Longman.         [ Links ]

32. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.         [ Links ]

33. Sood, S. & Vasserman, L. (2009). ESSE: Exploring Mood on the Web. Third International AAAI Conference on Weblogs and Social Media (ICWSM) Data Challenge Workshop, 32–39.         [ Links ]

34. Strapparava, C. & Mihalcea, R. (2007). SemEval–2007 Task 14: Affective Text. 4th International Workshop on Semantic Evaluations (SemEval '07), Prague, Czech Republic, 70–74        [ Links ]

35. Strapparava, C. & Valitutti, A. (2004).Wordnet–affect: an affective extension of wordnet. 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 1083–1086.         [ Links ]

36. Turney, P.D. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, USA, 417– 424.         [ Links ]

37. Whitelaw, C., Garg, N., & Argamon, S. (2005). Using appraisal taxonomies for sentiment analysis. CIKM–05, the ACM SIGIR Conference on Information and Knowledge Management.Bremen, Germany, 625–631.         [ Links ]

38. Wiebe, J., Wilson, T., Bruce, R., Bell, M. & Martin, M. (2004). Learning Subjective Language. Computational Linguistics, 30 (3), 277–308.         [ Links ]

39. Yang, C., Lin, K.H.Y. & Chen, H.H. (2007). Emotion classification Using Web Blog Corpora. IEEE / WIC / ACM International Conference on Web Intelligence, Silicon Valley, USA, 275–278.         [ Links ]

40. Yang, C., Lin, K.H.Y. & Chen, H.H. (2007). Building Emotion Lexicon from Weblog Corpora. 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL'07), Prague, Czech Republic, 133–136.         [ Links ]

41. Zhang, Y., Li, Z., Ren, F. & Kuroiwa, S. (2008). A Preliminary Research of Chinese Emotion Classification Model. International Journal of Computer Science and Network Security, 8(11), 127–132.         [ Links ]

Creative Commons License Todo o conteúdo deste periódico, exceto onde está identificado, está licenciado sob uma Licença Creative Commons