1 Introduction
Twitter has become one of the most popular social media platforms for people to share their attitudes or minds. It also allows people to communicate each other based on following or friend relationships 1. As twitter users are rapidly increased, interactions among the users are markedly increased and influence of users who have lots of followers is greater. These interactions further create opportunities for business companies to conduct online marketing activities2.
Traditional newspapers and the media are measured influence and far-reaching power of the media over the number of subscribers. However, subscribers in twitter are showing various opinions through a mention. Recent researches on social network analyze characteristics of tweets and infer occurrences, extract events, and detect influential users 3,4,5. These researches measured the influence of followers in social network, and considered a follower as a supporter for a target user (a followee).
In our observation, however, followers can have different purposes in making following relationship. It results in making different responses for the followee’s tweets. Some followers reply or retweet for particular issues and other followers reply, retweet or remain mentions for all of tweets. Besides, followers are not positive to all of opinions toward followee’s opinions in the tweet. They show negative opinion toward a target user on the particular issues. Therefore, it is not appropriate to classify the follower to be a supporter.
On the other side, when a famous user writes tweets on Twitter, people tend to listen to opinions of influential users. These users play an important mediating role in the spread of tweets. We call these users as influential transmitters. An influential transmitter can be shown as a representative of the supporting followers.
In this paper, we propose follower polarity classification by detecting influential transmitters and follower opinion classification according to social issues by clustering social issues. In order to see the effectiveness of the proposed method, the experiments are conducted on a Korean tweet collection.
The paper is organized as follows: Section 2 describes related work. Section 3 presents our classification model. Section 4 shows experimental results. Finally, we conclude in Section 5.
2 Related Work
Generally, influence means changes of people’s cognition, attitude and behavior. Socio-scientific studies on the influence were displayed from various angles such as related network analysis, expected theory, persuasive process research, and so on.
Research on measuring user influence in Twitter 3 applied and compared various standards such as the number of followers, the number of retweets, the number of mentions so as to examine who can be influential in twitter space. Weng et al. 4 proposed the TwitterRank, which was extended from PageRank to identify influential Twitter users by taking both the link structure and the topical similarity into consideration.
Social influence is measured generally based on the graph based algorithms such as PageRank and HITS graph. To find influential authors in brand-page communities, Purohit et al. 5 measured influence of twitter users by the number of retweets, replies and mentions based on the HITS algorithm. In our method, we focus on detecting influential transmitters not influential users to expand information for followers’ activities to a target user for the polarity classification of a follower.
Researches on sentiment classification are divided into methods to utilize external resources and to utilize internal resources. The method using external resources such as sentiment dictionary and information searching snippet improves confidence of a collected corpus. The method using internal resources is based on syllable n-gram or sliding window 6. The weakness is that it depends on corpus. In our method, we use bigram and trigram syllables to recognize positive and negative sentiments by analyzing Korean morpheme and use support vector machines to classify sentiments of tweets.
3 Analyzing Follower Behaviors
In order to analyze a follower’s intention of the following relationship and opinion on the particular issue toward a target user, the followers are classified as a supporter, a non-supporter, or neutral. In order to make an anchor, influential transmitters are detected as strong supporters. Then an opinion of follower is classified for each social issue as positive, negative, or neutral toward the opinion of a target user. To detect social issues, issue clusters are constructed based on retweet lifetime. The overall system architecture is shown in Figure. 1.
3.1 Follower Polarity Classification Using Influential Transmitters
According to a follower’s intention for the following relationship toward a target user, their behaviors might be different for tweets of a target user. A follower can be classified into supporter, non-supporter, or neutral toward a followee.
Even though a follower supports a target user, opinions of the follower can be negative to the opinion of a target user depending on the particular issues.
Based on our observation, there are important users with lots of followers who spread information in order to support a target user as strong supporters. We call this user as an influential transmitter for the target user. An influential transmitter plays an important mediating role in the spread of the opinion of a target user to influence their followers and people.
Besides, most followers tend to co-follow the target user and the influential transmitter. When a user retweets a tweet of the influential transmitter, it can be considered as the retweet of a tweet of the target user. To expand information of activities of followers, the activities among influential transmitters and a target user are considered to analyze polarity of a follower. Since influential transmitters can represent followers, we can predict an opinion of all followers via a response for the influential transmitters.
3.1.1 Detecting Influential Transmitters
Influential transmitters are different from influential users. The extraction method of influential authors 5 used the most influential user by retweets, mentions and replies among all the users. However, influential transmitters to be extracted in this research means users who play an important role of spreading of the followee’s tweets or opinions in order to support the followee.
Since the conventional HITS algorithm can only detect influential users, we revised an edge on the HITS graph to delicately express the relationship of retweeting by applying the concept of social contagion to detect influential transmitters. The social contagion refers to tendency for certain behavior exhibited by one user to be copied by other users who have been exposed to media coverage describing the behavior of the target user12.
When a follower retweets a tweet of a target user, an edge is connected between the follower and only to the transmitter, not the original user of a tweet. The connecting condition of an edge is as follows.
A follows a target user T. B follows A and T.
Both A and B retweet a tweet of T.
The retweet sequence is that B retweets a tweet of T after A does.
The authority and hub score are calculated as follows on HITS graph:
where w ij is for a weight of e ij and e ij is the number of followees who retweet a tweet of j before i retweets when i retweets an j’s tweet.
When the conditions as described above are satisfied, the user A gives an effect to the user B when B retweets a tweet of T. Therefore, the related work 5 creates edge of A->T and B->T, the proposed method, however, creates an edge of B->A so as to reflect the transmitter. By reflecting these flows of retweeting relationships on the HITS algorithm, the high ranked followers with high authority scores are detected as influential transmitters. Figure 4 shows the results of detecting influential transmitters.
3.1.2 Follower Polarity Classification
Influential transmitters deliver tweets of a target user to their followers and propagate an opinion of a target user. They can be considered as strong supporters with the same opinion of the target user. Not only retweeting action for a target user’s tweet can be seen as the agreement of opinion of the target user, but also retweeting for an influential transmitter’s tweet can be considered as the supporting action for the target user’s opinion.
Based on the retweet relationships among the followers on social network, the supporters for the target user are classified with strong confidence.
Then, the followers who cannot be classified by the retweet structure are classified by using support vector machines (SVMs) which show high performance in classification. We used bigram and trigram syllables to identify positive or negative sentiments in tweets by constructing bi-gram syllables sentiment dictionary after Korean morphological analysis. The number of positive and syllables and negative syllables in the dictionary are 146 and 135, respectively.
3.2 Follower Opinion Classification on Social Issue Clusters
Even though a follower supports a target user, opinions of the follower can have different responses such as positive or negative toward the target user depending on the particular issues. Social issues should be extracted in order to analyze opinions of followers.
Based on our observation that all tweets of a target user do not show popularity, the tweets on social issues are extracted to analyze. In order to extract issue keywords, the conventional tfidf weighting scheme is slightly modified to detect issue keywords.
where t represents a term, tf(t, d) for term frequency of t in a tweet d, idf(t,N)= log(N/df(t)). N represents the total number of tweets of a target user. df(t) represents document frequency of the term t. sf(t) = IM(t) / IM all . IM(t) represents the number of transmitters who mention a term t among influential transmitters, and IM all represents the total number of influential transmitters.
Two terms with the most high values of weight(t) are extracted to represent an issue keyword. Here we only used tweets of a target user and influential transmitters to extract issues. Then all tweets of each follower are clustered according to the each issue.
Tweets are clustered by using topic modeling based on the retweet lifetime. Then the opinion is analyzed for the clustered tweets by sentiment analysis using machine learning.
3.2.1 Clustering Tweets on Social Issues Based on LDA
Latent Dirichlet Allocation (LDA) is a fully generative model for describing the latent topics of documents and a standard tool in topic modeling 8,9,10.
For the extraction method of topics by LDA in social data, TimeUserLDA model 11 is to extract topic which is explosively increased within particular time in tweet data. The TimeUserLDA is adapted for the our work to make issue clusters on each social issue for the followers’ tweets of the target user, by reflecting relations between tweet’s lifetime and target user to topic model.
In order to make clusters of tweets related to a target user’s tweet, two conditions are considered whether it belongs to the lifetime of a target user’s tweet or not. The tweet lifetime of a target user is defined for the time from the creation of a tweet to the time of a final retweet.
Variables indicated in the figure are as follows.
α: Early prior probability each document belongs to topic K
β: Early prior probability each word belongs to topic K
T: Time t ∈ {1,…,T}
F: Set of users writing at time t, f ∈ {1,…,F}
D: All of tweet user u writes at time t, d ∈ {1,…,M}
W: Total words appeared in document d, w ∈ {1,…,N}
η f : probability of topic k appeared in user f, f ∈ {1,…,F}, k ∈ {1,…,K}
θ t : probability of topic k appeared within the lifespan of a tweet of a target user t∈{1,…,T}, k∈{1,…,K}
φ c : probability of word w belongs to topic k, k ∈ {1,…,K}, w ∈ {1,…,V}
z: Topic ratio of word w in document d, d ∈ {1,…,M}, w ∈ {1,…,N}
K: the number of topic k ∈ {1,…,K}
π: Probability of selecting a topic of a target user rather than individual topic
A tweet posted by a follower from the time of a tweet created until the last retweet probability can be associated with the target user's tweet.
Follower Opinion classification for Each Issue. The clustered tweets on each issue are analyzed for sentiment whether to express positive, negative or neutral opinion toward the opinion of a target user. Here, retweeting activities for a tweet can be considered as a positive opinion toward the target user for the issue. Based on retweeting relationship, tweets are classified as positive.
For the analysis of other tweets, we used SVMs to classify a follower’s opinion based on the bigram and trigram syllables in a sentiment dictionary for Korean characters to consider a user’s writing patterns by using a variation of a word on Twitter.
4 Experiments
4.1 Korean Twitter Test Collection
To see the effectiveness of the proposed method, we construct a test collection for Korean Twitter data. Four target users are selected and all their tweets and all followers and the followers’ tweets are collected from May 1 to May 31, 2013 via Twitter API. The users selected stand on political issues and have lots of followers. The performance is measured in recall, precision, and F1.
Table 1 shows the statistics of followers and the followers’ tweets for each target user.
Table 2 shows the gold standard data for opinion classification of followers. Issues are selected from the clustered issues. 100 tweets are randomly selected from each issue.
4.2 Experimental Results
4.2.1. Results of Follower Polarity Classification
Comparison methods for a follower’s polarity classification are as follows.
Table 3 shows results of follower polarity classification. The classification with influential transmitters achieves higher performance compared to the related work 5.
Table 4 shows the results of the classification for each class in precision. The proposed method in the supporting follower detection shows higher performance as compared to the comparative method. It shows that the method using influential transmitters is effective to classify the supporting followers.
4.2.2. Results of Social Issue Clustering
Comparison methods for tweet clustering are as follows.
-. TimeUserLDA: LDA with time and users to find social issues 11,
-. Proposed method: LDA model by applying the lifespan of tweets.
Table 5 shows the result of the social issue clustering. The 200 tweets are randomly selected from each cluster which is judged to be an issue for a target user among the clusters. The experimental result shows that the model using the lifespan of tweets is effective.
4.2.3. Results of Opinion Classification of a Follower
Comparison methods for classifying opinions are as follows.
-. SVM based on morpheme features: SVM with morpheme features after Korean morphological analysis,
-. SVM with bigram syllables: SVM with bigram syllables features to represent sentiments,
-. SVM with trigram syllables: SVM with trigram syllables features to represent sentiments.
Table 6 shows the results of opinion classification. This result shows that newly-coined words and the variation of words by users on social network can be considered by using trigram syllables.
4.2.4. Analysis of Opinions for Followers
Even though a follower supports a target user, the follower can have an opposite opinion toward the target user. On each social issue of the target user, the supporting and non-supporting ratio is analyzed.
Table 7 shows the results of follower polarity and opinion classification for tweets on each issue for the target user. According to opinion classification for the particular issues by each target user, results of opinion classification exist users who have opposing opinions unlike the followers classification. This analysis shows that a follower is not always supporting toward the target user. A target user can take an advice by the followers’ responses for the future action.
5 Conclusion
Detecting influential transmitters and applying on HITS algorithm is effective for follower polarity classification as for supporting, non-supporting or neutral. In order to specifically analyze opinions of followers, tweets of each follower are clustered based on LDA model by considering the lifespan of tweets according to an issue mentioned by the target user. Then the tweets are classified as a positive, negative opinion toward the target user. From the analysis of the experimental results, we found that the supporting followers generally agree the opinion of the target user; however, some are opposed to the particular issues. This research showed that the proposed method of the follower polarity classification and opinion classification depending on the issues is effective. Future researches should focus on methods for better user classification and less dependency on followers.