Extracting Phrases Describing Problems with Products and Services from Twitter Messages

Gupta, Narendra K.

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 Ciudad de México Apr./Jun. 2013

Artículos

Extracting Phrases Describing Problems with Products and Services from Twitter Messages

Extracción de frases que describan problemas con productos y servicios de mensajes Twitter

Narendra K. Gupta

AT & T Labs - Research, Inc., Florham Park, NJ 07932, USA ngupta@research.att.com

Article received on 03/12/2012
Accepted on 11/01/2013.

Abstract

Social media contain many types of information useful to businesses. In this paper we discuss a trigger-target based approach to extract descriptions of problems from Twitter data. It is important to note that the descriptions of problems are factual statements as opposed to subjective opinions about products/services. We first identify the problem tweets i.e. the tweets containing descriptions of problems. We then extract the phrases that describe the problem. In our approach such descriptions are extracted as a combination of trigger and target phrases. Triggers are mostly domain independent verb phrases and are identified by using hand crafted lexical and syntactic patterns. Targets on the other hand are domain specific noun phrases syntactically related to the triggers. We frame the problem of finding target phrase corresponding to a trigger phrase as a ranking problem and show the results of experiments with maximum entropy classifiers and voted perceptrons. Both approaches outperform the rule based approach reported before.

Keywords: Social media, information extraction, text classification.

Resumen

Medios sociales de comunicación contienen muchos tipos de información útil para las empresas. En este artículo se considera un enfoque orientado al método de "desencadenante-objetivo" para extraer descripciones de problemas de los datos de Twitter. Es importante mencionar que las descripciones de problemas son declaraciones de hechos a diferencia de opiniones subjetivos acerca de productos/servicios. En primer lugar se identifican los tweets de problema, es decir los tweets que contienen descripciones de problemas. En el enfoque propuesto tales descripciones se extraen como una combinación de frases de desencadenante y objetivo. Desencadenantes son en su mayoría frases verbales independientes del dominio y se identifican mediante patrones léxicos y sintácticos creados manualmente. Por otro lado, objetivos son frases nominales específicas del dominio particular y sintácticamente relacionadas con las desencadenantes. Se ataca el problema de encontrar la frase objetivo correspondiente a la frase desencadenante dada como un problema de ranking y se presentan los resultados de experimentos con clasificadores de máxima entropía y perceptrones de votación. El rendimiento de ambos enfoques es mejor que el del enfoque basado en reglas reportado anteriormente.

Palabras clave: Medios sociales de comunicación, extracción de información, clasificación de textos.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

1. Collins, M., Duffy, N., & Park, F. (2002). New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL2002). 263-270. [ Links ]

2. Cook, P. & Stevenson, S. (2009). An unsupervised model for text message normalization. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2009) Workshop on Computational Approaches to Linguistic Creativity. Boulder, Colorado, USA, 71-78. [ Links ]

3. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on the World Wide Web (WWW2003). 519-528. [ Links ]

4. de Marneffe, M.-C., MacCartney, B., & Manning, C. D. (2006). Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of the IEEE / ACL 2006 Workshop on Spoken Language Technology. The Stanford Natural Language Processing Group. [ Links ]

5. Gildea, D. & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3), 245-288. [ Links ]

6. Gimpel, K., Schneider, N., O'Connor, B., Das, D. , Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for twitter: Annotation, features, and experiments. In Proceedings of the 49th Annual Meeting on Association for Computational Linguistics (ACL2011). 42-47. [ Links ]

7. Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., & Tomokiyo, T. (2005). Deriving marketing intelligence from online discussion. In Preceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (KDD2005). 419-428. [ Links ]

8. Gupta, N. K. (2011). Extracting descriptions of problems with product and services from twitter data. In Proceedings of the 3rd Workshop on Social Web Search and Mining (SWSM2011). Beijing, China. [ Links ]

9. Han, B. & Baldwin, T. (2011). Lexical normalisation of short text messages: Makn sens a #twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011). Association for Computational Linguistics, Portland, Oregon, USA, 368-378. [ Links ]

10. Hu, M. & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of the 19th national conference on Artifical intelligence (AAAI2004). [ Links ]

11. Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011). Association for Computational Linguistics, Portland, Oregon, USA, 151-160. [ Links ]

12. Kaufmann, J. & Kalita, J. (2010). Syntactic normalization of twitter messages. In Proceedings of International Conference on Natural Language Processing, Kharagpur, India. [ Links ]

13. Kim, S.-M. & Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of the International Conference on Computational Linguistics (COLING2004). [ Links ]

14. Liu, J. & Seneff, S. (2009). Review sentiment scoring via a parse-and-paraphrase paradigm. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009). Singapore, 161-169. [ Links ]

15. Liu, X., Zhang, S., Wei, F., & Zhou, M. (2011). Recognizing named entities in tweets. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011). 359-367. [ Links ]

16. Nigam, K. & Hurst, M. (2004). Towards a robust metric of opinion. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Standford, CA. [ Links ]

17. Nigam, K., Lafferty, J., & Mccallum, A. (1999). Using maximum entropy for text classification. In Proceedings of International Joint Conference on Artificail Intelligence (IJCAI-99) Workshop on Machine Learning for Information Filtering. 61-67. [ Links ]

18. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2002). Philadelphia, PA, USA, 79-86. [ Links ]

19. Pradhan, S. S., Ward, W., & Martin, J. H. (2008). Towards robust semantic role labeling. Computational Linguistics, 34(2), 289-310. [ Links ]

20. Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2011). 1524-1534. [ Links ]

21. Saeger, S. D., Torisawa, K., & Kazama, J. (2008). Looking for trouble. In Proceedings of the 22nd International Conference on Computation Linguistics (COLING2008). 185-192. [ Links ]

22. Toutanova, K., Haghighi, A., & Manning, C. D. (2005). Joint learning improves semantic role labeling. In Proceedings of the Annual Meeting of the Association of Computational Lingusitics(ACL2005). 589-596. [ Links ]