SciELO - Scientific Electronic Library Online

 
vol.22 issue1Stylometry-based Approach for Detecting Writing Style Changes in Literary TextsCharacter Embedding for Language Identification in Hindi-English Code-mixed Social Media Text author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

Print version ISSN 1405-5546

Abstract

ASNANI, Kavita  and  PAWAR, Jyoti D.. Extraction of Code-mixed Aspect Topics in Semantic Representation. Comp. y Sist. [online]. 2018, vol.22, n.1, pp.55-63. ISSN 1405-5546.  http://dx.doi.org/10.13053/cys-22-1-2771.

With recent advancements and popularity of social networking forums, millions of people virtually connected to the World Wide Web, commonly communicate in multiple languages. This has led to the generation of large volumes of unstructured code-mixed social media text having useful aspects of information highly dispersed. Aspect based opinion mining relates opinion targets to their polarity values, in a specific context. It is known that since aspects are often implicit, detecting and retrieving them is a difficult task. Moreover, it is very challenging as the code-mixed social media text suffers from its associated linguistic complexities. As a standard, topic modeling has a potential of extracting aspects pertaining to opinion data from large text. This results not only in retrieval of implicit aspects but also in clustering them together. In this paper we propose knowledge based language independent code-mixed semantic LDA (lcms-LDA) model, with an aim to improve the coherence of clusters. We find that the proposed lcms-LDA model infers topic distributions without language barrier, based on semantics associated with words. Our experimental results showed an increase in the UMass and KL divergence score indicating an improved performance in the resulting coherence and distinctiveness of aspect clusters in comparison with the state-of-the-art techniques used for aspect extraction of code-mixed data.

Keywords : Code-mixed aspect extraction; knowledge-based topic modeling; semantic clustering; BabelNet; language independent semantic word association.

        · text in English     · English ( pdf )