An Exploratory Study of the Use of Senses, Syntax and Cross-Linguistic Information for Subjectivity Detection in Spanish

López, Rodrigo; Peñaloza, Daniel; Beingolea, Francisco; Tenorio, Juanjose; Sobrevilla Cabezudo, Marco; López, Rodrigo; Peñaloza, Daniel; Beingolea, Francisco; Tenorio, Juanjose; Sobrevilla Cabezudo, Marco

doi:10.13053/cys-23-3-3279

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.23 no.3 Ciudad de México jul./sep. 2019 Epub 09-Ago-2021

https://doi.org/10.13053/cys-23-3-3279

Articles of the Thematic Issue

An Exploratory Study of the Use of Senses, Syntax and Cross-Linguistic Information for Subjectivity Detection in Spanish

Rodrigo López¹^*

Daniel Peñaloza¹

Francisco Beingolea¹

Juanjose Tenorio¹

Marco Sobrevilla Cabezudo¹

^¹ Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Brazil. a20112387@pucp.pe, daniel.penaloza@pucp.pe, francisco.beingolea@pucp.pe, juanjose.tenorio@pucp.pe, msobrevillac@usp.br.

Abstract

This work presents an exploratory study of Subjectivity Detection for Spanish This study aims to evaluate the use of dependency relations, word senses and cross-linguistic information in Subjectivity Detection task. The first steps of this method include the labeling process of a Spanish corpus and a Word Sense Disambiguation algorithm. Then cross-linguistic English-Spanish information is obtained from Semcor corpus and used together with the Spanish data. Finally this approach (using all gathered information and supervised algorithms) was tested showing better results than the baseline method in general.

Keywords: Subjectivity detection; dependency relations; wordnet; subjectivity word sense disambiguation; graphs; Spanish

1 Introduction

Subjectivity Detection is the task that aims to determine whether a text is subjective or objective which means if it express and opinion or not ^[¹⁹^]. According to ^[¹¹^], this task is considered to be more difficult than polarity classification; which focuses in recognize subjectivity as positive or negative. This could be due to different reasons such as non subjective sentences getting classified as positive or negative, an objective sentence implying an opinion which ^[⁸^] describes as implicit opinion and more different cases.

Several studies on Sentiment Analysis have been performed, but most of them are in English including their tools and data ^[⁹^], which is a reason to contribute with information from Spanish. Besides, classic methods attribute the subjectivity of text to the value of its respective words; ignoring some other factors such as their respective senses or the relations between them. According to ^[¹³^], sentences may have "sentiment words" but this is not enough to differentiate an opinion sentence from a non-opinion one. Considering the difficulties mentioned, some examples are shown below:

— Mi teléfono se apaga una y otra vez. (My cellphone turns off over and over again).
— El nuevo Samsung Galaxy Note 7 es la bomba. (The new Samsung Galaxy Note 7 is the bomb).
— Había una bomba en la escuela. (There was a bomb in school).

In the first example, an implicit opinion is shown, an objective sentence which express an opinion, in this case a negative one. About this sentence, it is significant to emphasize that the expression una y otra vez was associated with the word apaga, adding a new value to the sentence, making impossible to consider this sentence as an objective one. In the second example there is a subjective sentence with the word bomba, but the third one is an objective sentence with the same word, so a traditional classifier could have difficulties, since the senses of words are not considered.

This work presents an exploratory study of Subjectivity Detection for Spanish, which consider both the dependency relations of the words and word senses in the detection process.

Also, due to the lack of annotated resources (corpora with senses and subjectivity annotation) and in order to evaluate the cross-linguistic potential, we experimented the use of resources in English to train the subjectivity detector and compare their results with the training over a portion of an Spanish corpus manually annotated.

The paper is organized as follows, Related works are presented in Section 2. Section 3 discusses the work of gathering the knowledge for Subjectivity Detection. Section 4 is about testing the data obtained on Section 3 with supervised learning methods in order to see the subjectivity detection in sentences. Finally, Section 5 is about the final conclusions of this work.

2 Related Works

A semantic orientation-based approach is presented in ^[⁷^]. Negation and POS-Tagging were used to choose the best features for subjectivity detection between uni-grams and phrases. SentiWordNet ^[¹^] was used with Point-wise Mutual Information (PMI) ^[⁴^] to determine the semantic orientation of English documents, with good results using several features.

A study using information from Spanish tweets was proposed by ^[¹⁷^]. Tweets include a lot of information besides the text which was exploited in this work. This information included unstructured and structured data. Thus, with these categories, different features were used such as emoticons, favorites, and retweets. After that, different supervised learning algorithms used each kind of data to get interesting results.

A framework for subjectivity detection using features in English and Spanish is shown in ^[³^]. This framework uses Extreme Learning Machine(ELM), described in ^[⁶^], but with Bayesian networks supporting its structure. Firstly, text is converted in a vector of words. This vector is processed in a deep convolutional neural network together with ELM. Finally, a Fuzzy Recurrent Neural Network is used to classify the initial text as positive, negative or neutral.

The work proposed by ^[¹⁴^] presented an unsupervised Word Sense Disambiguation strategy for Subjectivity Detection in English. This approach relied on labeled data from different resources and information from SentiWordNet, since its focus was to just determine their subjectivity value. After this a rule-based method to classify sentences was used counting the words and testing it with supervised classifiers.

A rule-based method which uses knowledge from WordNet ^[¹²^] and SentiWordNet for texts in Spanish is presented in ^[²^]. It includes Word Sense Disambiguation using graphs with WordNet's senses to get subjectivity values. Then each word depending on its subjectivity value and its lexical category get a different weight to determine the subjectivity of a sentence. Besides, to get some of its parameters and values this work used the Semcor corpus in order to evaluate the usefulness of resources from other languages ^[¹⁰^].

These studies have shown some important points such as: firstly, since there are not enough information from Spanish, using English resources should help with no problems, also the use of WordNet and SentiWordNet is really common. Secondly, there are different techniques but most of them rely on just the words, which means this work will be a good contribution. Thirdly, some approaches proved that using different features besides just the text may show great results. Finally, the use of Word Sense Disambiguation is helpful and improve the results for this task.

3 Subjectivity Detection

This work is based on graphs and dependency relations, which are used for a Word Sense Disambiguation method. Then, these results together with the same relations are used as features for supervised learning algorithms to determine the subjectivity of the sentences. The steps required for this, are explained in the next subsections.

3.1 Preliminaries

3.1.1 Corpus Annotation

In order to explore the subjectivity detection strategies for Spanish, the FilmAffinity corpus ^[²^] was used. This corpus contains 2.500 objective sentences and 2.500 subjective sentences. In this paper, we will use the subjective sentence presented in Example 1 to explain the steps performed in our work.

Example 1. Este inspirador drama, mientras que trafica con clichés, logra no entregar su mensaje de una manera demasiado pesada.

Since this corpus only contains information about subjectivity for each sentence, and our work focused on exploring the use of fine-grained information in subjectivity detection, it was necessary to incorporate more knowledge into the corpus manually. In this case, information about senses were incorporated. Senses used were extracted from Multilingual Central Repository 3.0 (MCR) ^[⁵^], which includes WordNets from different European languages (including Spanish) and is aligned with Princeton WordNet 3.0 ^[¹²^] and SentiWordNet 3.0 ^[¹^] (which includes polarity information to senses).

The annotation process on content words, i. e., Nouns (N), Verbs (V), Adjectives (A) and Adverbs (R) used the MCR's senses^¹. To determine the words belonging to these grammatical categories and its respective lemmas, Freeling 4.0 ^[¹⁵^] was used.

Due to annotation being a long and difficult task to be performed, just a small percentage of sentences of the corpus was annotated. This represented 8% (200 objective and 200 subjective sentences) of the corpus. The annotation was performed by four annotators with knowledge about Natural Language Processing. Besides the sense annotation, information from SUMO^² was extracted, taking advantages from alignments between SUMO and MCR. SUMO contains information about some kind of attributes, specifically, any word may have more multiple attributes but in this case just SubjectiveAssessmentAttribute was extracted for the annotation.

Some annotated tokens of the Example 1 and its respective information are presented in Table 1. Column "Sense Identifier" contains the InterLingual Index. This is the WordNet's identifier and is useful to map between WordNets and SentiWordNet. In table 1, several senses may be seen for each word. For example, the word "drama" presents three senses associated with the synonyms "dramaturgia and dramatica", "obra teatral", and "evento dramatico" and "tragedia", respectively. Also, glosses and attributes from SUMO are presented for each word sense. With this information, annotators had to choose the sense more adequate in each sentence.

Table 1 Words and Senses Information

Word / Lemma	Tag	Sense Identifier	Synonyms	Gloss	Attributes
Drama	N	spa-30-06376154-n	dramaturgia dramática	—	Text
		spa-30-07007945-n	obra_teatral	—	Text
		spa-30-07290278-n	evento_dramático tragedia	—	SubjectiveAssessment Attribute
demasiado	R	spa-30-00047392-r	excesivamente demasiadamente	—	SubjectiveAssessment Attribute
demasiado	R	spa-30-00415963-r	en_demasía excesivamente	más de_lo necesario	SubjectiveAssessment Attribute

3.1.2 Subjectivity Annotation

Even though senses were annotated, this study focused on subjectivity information, therefore, information from SentiWordNet was incorporated taking advantage of the alignments with MCR. SentiWordNet is focused on sentiment analysis and assigns positive, negative and neutral scores for each word sense, which must sum to 1. Thus, subjectivity score was defined as the sum of positive and negative scores and objectivity score was defined by the neutral score. After this, four subjectivity categories (non-subjectivity or NS, low subjectivity or LS, middle subjectivity or MS, and high subjectivity or HS) were defined according the subjectivity score. Table 2 presented the range of each category. Also, senses with SubjectiveAssessmentAttribute were annotated as HS.

Table 2 Subjectivity Categories

Category	NS	LS	MS	HS
Subjectivity Score Range	0	≤ 0.25	≤0.50	> 0.50

3.2 Subjectivity Word Sense Disambiguation (SWSD)

The first step in our proposal consisted in determining the subjectivity category for each word in a target sentence in order to use them to determine the subjectivity of a overall sentence. To achieve this step, an adaption to the work proposed by ^[¹⁸^] was performed because our work was focused on disambiguating senses instead words.

3.2.1 Graph Building

Similar to ^[¹⁸^], the graph for a sentence were built from its dependency tree. Thus, the nodes were defined by the senses of the words (obtained from MCR and SentiWordNet) and the edges were defined by the dependency relations included in the dependency tree.

Figure 1 shows the dependency tree of the Example 1 generated by Freeling, which will be used in this Section.

Fig. 1 Dependency Tree of Example 1

In order to evaluate the level of granularity of the nodes (in relation to senses and subjectivity), two configurations for the graphs were tested. The first one, called Separated Graph, considered a word-sense for each node. The second one, called Grouped Graph, considered a group of senses with the same subjectivity category for each node.

The weight of the edges was defined as the inverse of the distance between two nodes in the WordNet knowledge graph since it was considered that two senses are more related when they are closer in a graph. Distances were obtained from the application of Dijkstra algorithm on whole WordNet Knowledge Graph.

The weights in Separated Graphs were easier to be calculated (the definition before mentioned was used), since each node contained only one sense. In the case of grouped graphs, the weights were defined as the maximum value from the relations between the senses involved in each edge.

Figure 2 shows a subgraph of the graph generated for Example 1 considering the Grouped Graph configuration. As it may be seen, nodes contains one or more senses of a word according to the subjectivity category. For example, the node of the word "logra" belonging to the Low Subjectivity category (LS) groups four senses (sense identifiers are shown in Figure 2). Also, nodes are connected to other nodes according to their dependency relation. For example, the connection between "logra" and "drama" is defined by the dependency relation "subj" (it may be seen in Figure 1).

Fig. 2 Subgraph of the example sentence

3.2.2 SWSD

After Graph building, the subjectivity word sense disambiguation method was applied. Similar to ^[¹⁸^], the PageRank algorithm ^[¹⁶^] was executed. The Equation (1) shows the PageRank algorithm:

Pr=cMPr+1-cv. (1)

This equation is used in a graph (G) with N vertices, with these variables: The variable "Pr" will contain the result value for each vertex of the graph. The variable "c" is a constant from PageRank called damping factor. The variable "M" is an square matrix (NxN) with each element represented by the value Mji=1/di if there is a relation between vertices "i" and "j", otherwise the value is 0. The value of d_i represents the number of edges going out from the i vertex. The variable "v" is a vector containing a value for each vertex of the graph and its value is usually 1/N.

In this case, an adaptation of the algorithm used in ^[¹⁸^] was used on both graph configurations. Thus, some changes were implemented. For example, the definition of cell values of matrix "M" was changed as shown in Equation (2) :

Mji=wij∑zwiz. (2)

In this equation, w_ij is defined as the weight of the edge between vertices "i" and "j" and the sum considers all weights of edges that start from vertex "i". Besides that, the "Pr" vector was initialized with the value 1/N for each vertex and vector "v" had the value of frequency obtained from the WordNet as probabilities for each vertex. Finally, the damping factor used was 0.85 and the number of iterations was 30.

3.3 Bringing Cross-linguistic Knowledge

Since the already described corpus annotation is an important but laborious task, gather more data was necessary. Considering there were not many resources for a non-English language, cross-linguistic knowledge was used. Besides, in Section 2 was shown that using English resources could be useful and it was an opportunity to evaluate their influence in the results.

In this case, the Semcor corpus was used. Semcor contains 20.138 sentences annotated with WordNet's senses but it is not tagged with subjectivity classification at sentence-level. This way, OpinionFinder 2.0^³^[²⁰^] was executed to label which sentences were objective and subjective automatically. OpinionFinder^⁴ is a sentiment analysis tool, which shows a precision (91.7%) in the Subjectivity Detection task. After the execution of OpinionFinder, 934 objective sentences and 934 subjective sentences were selected to compose the English corpus.

To conclude with this section, it is important to note that since the tools used for this corpus were different, there were some information that were not available like it was for FilmAffinity's corpus. For example, SUMO ontology was not available neither the relations between senses from WordNet, due to labels differences; only the senses and its subjectivity values were found. Also, since there was a lot of sentences in the Semcor to check subjectivity manually, the OpinionFinder was used. So, these differences between how both corpus were worked may lead to different results, that will be describe later.

3.4 Incorporating Syntactic Knowledge

In order to evaluate the contribution of syntactic knowledge, several experiments were performed. Firstly, the words and their subjectivity were considered as features, since the majority of works rely on this. So, 16 features were considered, due to 4 subjectivity categories and 4 grammatical categories, being called grammatical features.

Secondly, with the obtained relations, the proposal of this work was to use them together with the words and categories as features. Mixing together the previous 16 grammatical features with each other according to their relations, resulting in 136 features called dependency features. Next, it was decided to mix both the the grammatical and dependency features to evaluate the their use. Finally, all these features were used with different supervised learning methods. Some examples of the dependency features, using Example 1, are shown in Table 3.

Table 3 Final Features

Relations	Features
inspirador - drama	A-HS N-NS
drama - logra	N-NS V-LS
demasiado - pesada	R-HS V-NS

4 Results and Discussion

4.1 Corpus Annotation

In relation to the corpus annotation, it is important to regard the following details:

— We used the senses from WordNet with information in Spanish, however sometimes that was not enough to choose the appropriate sense, so information from WordNet in English was checked too and when there was not enough information to resolve the confusion with similar senses, the most frequent one was consider the right one.
— There were cases when an appropriate sense could not be found for a word in both Spanish and English; so, a lemma that could be considered as a similar one, in the specific context, was used to search the right sense.
— There were words that were classified wrongly by the POS-tagger, so in those cases, the POS-tag was changed for an adequate one and the search used the lemma with its new tag, with the first and/or second consideration if necessary.
— When the first, second and third items were not enough for a word, it was left blank, since there were cases when a word has no meaning by itself (such as proper names or modal verbs) making it impossible to find any sense at all.

Finally, 4.620 words (belonging to 400 sentences extracted of the original corpus) were annotated and used to evaluate the recall of the subjectivity word sense disambiguation method (SWSD).

4.2 Subjectivity Word Sense Disambiguation

In relation to the SWSD, the results obtained from the SWSD method (graphs grouped and separated) were compared with a baseline. In our case, The Most Frequent Sense (MFS) was selected as baseline. This heuristic works in the following manner: All words were labeled with its most frequent sense from the WordNet. The results are shown in Table 4.

Table 4 SWSD summary

	Graphs					MFS
	Group	R	Sep	R	Total		R	Total
Noun	1210	0.82	1199	0.81	1473	1817	0.83	2198
Verb	686	0.63	716	0.66	1084	750	0.65	1162
Adj.	625	0.74	623	0.74	840	665	0.74	897
Adv.	266	0.84	269	0.85	316	303	0.83	363
	2787	0.75	2807	0.76	3713	3535	0.77	4620

Table 4 shows the results of all methods tested. As it may be seen, using Separated Graphs produced better results than Grouped Graphs, even though this difference could be not significant. Also, the results of SWSD using separated graphs outperformed the results for MFS in all grammatical categories, except for nouns, producing a worse, although not significant, overall performance (due the frequency of annotated nouns).

These results may be explained by different reasons. For example, there were problems with the tools used, as explained earlier in this section, and a small amount of data could be annotated. Besides, our method could not analyze all the data, since we use relations between words, but that was not the case for MFS.

Finally, it is important to mention that the most common mistakes in the algorithms used for WSD in this work happened with verbs a significant number of times. This could be explained by all the problems already mentioned in the annotation process; such as problem with the tools (POS-tagger) or finding the appropriate sense for a word, since some words, specially verbs, are associated with a big number of senses making it more difficult for the algorithm.

4.3 Subjectivity Detection

As mentioned in Section 3.3 and Section 3.4, we evaluated the use of Semcor (English corpus) and the use of dependency relations in subjectivity detection task. Also, we tested the usefulness of subjectivity word sense disambiguation in subjectivity detection. Thus, we experimented the following configurations: grouped graphs and separated graphs; training on Semcor, FilmAffinity and both corpus together; and training using dependency features, grammatical features and both features together.

All experiments were performed using Linear SVM algorithm (C value was 0.01) with all features normalized (without feature selection or dimensionality reduction). Also, a non-swsd baseline was used. Specifically, this baseline do not use WSD to obtain the subjectivity from words, but this is defined by the mean score of all its respective senses. Besides, we compared our results with the proposal presented in ^[²^]. This method used an rule-based method and a subjectivity word sense disambiguation algorithm to perform subjectivity detection in the same corpus. In order to evaluate our experiments, we tested on a sub-corpus of the FilmAffinity corpus. This corpus was composed by 500 sentences (250 objective and 250 subjective). Besides, to evaluate the use of general features, another method was compared with our proposal, which used Bag of Words (BOW) and TF-IDF together with the FilmAffinity corpus.

Table 5 shows the results of all experiments performed. We may say that using SWSD and our methods specially grouped graphs show a slightly better performance than the baseline in all the experiments, except for the FilmAffinity with grammatical features which showed the best results. This may be related with nouns which have more presence in the corpus and most of them tend to be N-NS or N-HS with objective and subjective sentences respectively, according to the labeled corpus. Besides, noun senses may be from any subjectivity category, so the graph methods may be making more mistakes than taking the mean score of the senses. Then, comparing to the other works (BOW and ^[²^]), all our best methods (training of FilmAffinity and using grammatical features) outperformed its results, being BOW which got the worst results. One point to highlight is that work proposed in ^[²^] used Semcor as training corpus and obtained results comparable with methods which used the same features and the same corpus.

Table 5 Subjectivity Detection Results

Method	Training Corpus	Features	Objectivity			Subjectivity
Method	Training Corpus	Features	P	R	F1	P	R	F1	Average F1

Grouped Graphs	Semcor	Dependency	0.76	0.64	0.70	0.58	0.71	0.64	0.67
		Grammatical	0.58	0.78	0.67	0.84	0.67	0.74	0.71
		Dependency + Grammatical	0.66	0.69	0.67	0.71	0.67	0.69	0.68
	FilmAffinity	Dependency	0.89	0.70	0.78	0.62	0.85	0.71	0.76
		Grammatical	0.88	0.75	0.81	0.71	0.85	0.77	0.79
		Dependency + Grammatical	0.88	0.72	0.79	0.66	0.84	0.74	0.77
	Semcor + FilmAffinity	Dependency	0.76	0.75	0.76	0.74	0.76	0.75	0.75
		Grammatical	0.71	0.79	0.75	0.81	0.74	0.77	0.76
		Dependency + Grammatical	0.74	0.76	0.75	0.76	0.74	0.75	0.75

Separated Graphs	Semcor	Dependency	0.75	0.64	0.69	0.59	0.70	0.64	0.67
		Grammatical	0.56	0.78	0.65	0.84	0.66	0.74	0.71
		Dependency + Grammatical	0.64	0.68	0.66	0.70	0.66	0.68	0.67
	FilmAffinity	Dependency	0.88	0.70	0.78	0.63	0.84	0.72	0.76
		Grammatical	0.86	0.74	0.79	0.70	0.83	0.76	0.78
		Dependency + Grammatical	0.85	0.71	0.78	0.66	0.81	0.73	0.76
	Semcor + FilmAffinity	Dependency	0.77	0.73	0.75	0.71	0.76	0.73	0.74
		Grammatical	0.68	0.78	0.73	0.81	0.72	0.76	0.75
		Dependency + Grammatical	0.72	0.75	0.74	0.76	0.73	0.74	0.74

Baseline	Semcor	Dependency	0.40	0.68	0.51	0.82	0.58	0.68	0.63
		Grammatical	0.28	0.90	0.42	0.97	0.57	0.72	0.67
		Dependency + Grammatical	0.36	0.74	0.48	0.88	0.58	0.70	0.64
	FilmAffinity	Dependency	0.71	0.69	0.70	0.68	0.70	0.69	0.69
		Grammatical	0.77	0.81	0.79	0.82	0.78	0.80	0.80
		Dependency + Grammatical	0.68	0.75	0.71	0.78	0.71	0.74	0.73
	Semcor + FilmAffinity	Dependency	0.53	0.72	0.61	0.79	0.63	0.70	0.67
		Grammatical	0.49	0.85	0.62	0.91	0.64	0.75	0.71
		Dependency + Grammatical	0.58	0.75	0.65	0.81	0.66	0.73	0.70
^[²^]	Semcor	Grammatical	0.74	0.60	0.66	0.66	0.78	0.72	0.70
BOW	FilmAffinity	TF-IDF	0.00	0.00	0.00	1.00	0.50	0.67	0.67

In relation to the cross-linguistic knowledge, FilmAffinity corpus showed the best and most consistent results for all the features and methods, which showed that the Semcor corpus may not be compatible with this work. Specifically, some subjective texts in FilmAffinity corpus were composed by 2 or 4 sentences together, unlike the Semcor, where it never happened. Then, since there is a relatively difference between the size of both corpus, it was evident that Semcor had a lot more senses and dependency relations. So, considering the differences described between tools, corpus data, words and/or features; it does not seem like both corpus could be used together or that good results would come out from using the English information.

Finally, dependency features were useless in all experiments, even harming the performance when mixing with grammatical features. One possible reason is that Freeling still suffers dealing with dependency relations. Thus, we could lose lot of information from a sentence, leading to worse results.

In order to perform a deep analysis, we analyzed false positives, with some points to remark. In models with Semcor corpus most of the errors were related to features with the category HS, since other categories were dominant the presence of this could be confused easily by the classifiers. Next, with the FilmAffinity corpus, the mistakes were related specifically with the most common features from 2 categories, being A-HS and N-NS this association was really common, so it was easily confused. However, this happened in specific situations like sentences with few relations including this feature or when using words and relations together, since the words have more weight due to being more increasing the probability of errors.

After this, with both corpus together the mistakes were similar due to FilmAffinity corpus being small in comparison, but it is interesting to note that this mix of corpus improved the results from Semcor. As a final point, it is important to mention that Semcor was checked (around 400 sentences) and a lot of mistakes were found, since most of the sentences looked like objective ones, which could be due to tools used or to the kind of text from the corpus, since it is related to news. The sentences were corrected, but with a small positive change in the results, so it confirmed that the used of Semcor was not the best for this work.

5 Conclusions and Final Remarks

In this paper, an exploratory study about subjectivity detection for Spanish was presented. We explored the use of Word Sense Disambiguation to identify senses' subjectivity; the incorporation of syntactic information to subjectivity detection; and the use of cross-linguistic information, specifically English, to train supervised models for Subjectivity Detection.

The SWSD was on pair with the selected baseline, so considering that the results were not exactly bad, gathering more labeled data will we important to the evaluation of this method in order to see how the results might change in all parts of this work. Then, before considering the subjectivity detection of texts, the Semcor corpus was used for the experiments.

Considering differences in tools, data, knowledge and with the final results it was determined that the information from English was not compatible with this work, or that maybe Semcor was not an appropriate corpus due to its nature or being labeled inaccurately either by its senses or by the OpinionFinder. Finally the experiments proposed here showed good results for the subjectivity detection task for both kind of graphs, with grouped graphs being better, proving that this approach is useful and other works will benefit from it.

Finally, some future works are related to annotate more data from FilmAffinity, to see if the results may be improved; testing data from another domain in order to see if the results change; using appropriate data from English or another language, labeling the information if necessary; and finally to use some of these features with a polarity classification tool to evaluate its usefulness.

References

1. Baccianella, S., Esuli, A., & Sebastiani, F. (2010). Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. LREC, volume 10, pp. 2200-2204. [ Links ]

2. Cabezudo, M. A. S., Palomino, N. L. S., & Perez, R. M. (2015). Improving subjectivity detection for Spanish texts using subjectivity word sense disambiguation based on knowledge. Latin American Computing Conference (CLEI), IEEE, pp. 1-7. [ Links ]

3. Chaturvedi, I., Ragusa, E., Gastaldo, P., Zunino, R., & Cambria, E. (2018). Bayesian network based extreme learning machine for subjectivity detection. Journal of The Franklin Institute, Vol. 355, No. 4, pp. 1780-1797. [ Links ]

4. Church, K. W. & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, Vol. 16, No. 1, pp. 22-29. [ Links ]

5. Gonzalez-Agirre, A., Laparra, E., & Rigau, G. (2012). Multilingual central repository version 3.0. LREC, pp. 2525-2529. [ Links ]

6. Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, Vol. 70, No. 1-3, pp. 489-501. [ Links ]

7. Khanna, S. & Shiwani, S. (2013). Subjectivity detection and semantic orientation based methods for sentiment analysis. International Journal of Scientific and Engineering Research. [ Links ]

8. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, Vol. 5, No. 1, pp. 1-167. [ Links ]

9. Lo, S. L., Cambria, E., Chiong, R., & Cornforth, D. (2017). Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artificial Intelligence Review, Vol. 48, No. 4, pp. 499-527. [ Links ]

10. Mihalcea, R. (1998). Semcor semantically tagged corpus. Unpublished manuscript. [ Links ]

11. Mihalcea, R., Banea, C., & Wiebe, J. (2007). Learning multilingual subjective language via cross-lingual projections. Proceedings of the 45th annual meeting of the association of computational linguistics, pp. 976-983. [ Links ]

12. Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, Vol. 38, No. 11, pp. 39-41. [ Links ]

13. Narayanan, R., Liu, B., & Choudhary, A. (2009). Sentiment analysis of conditional sentences. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, Association for Computational Linguistics, pp. 180-189. [ Links ]

14. Ortega, R., Fonseca, A., Gutierrez, Y., & Montoyo, A. (2013). Improving subjectivity detection using unsupervised subjectivity word sense disambiguation. Procesamiento del Lenguaje Natural, Vol. 51, pp. 179-186. [ Links ]

15. Padró, L. & Stanilovsky, E. (2012). Freeling 3.0: Towards wider multilinguality. LREC2012. [ Links ]

16. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab. [ Links ]

17. Sixto, J., Almeida, A., & López-de Ipiña, D. (2016). An approach to subjectivity detection on Twitter using the structured information. International Conference on Computational Collective Intelligence, Springer, pp. 121-130. [ Links ]

18. Sobrevilla-Cabezudo, M. A., Oncevay-Marcos, A., & Melgar, A. (2017). Sense dependency-rank: A word sense disambiguation method based on random walks and dependency trees. International Conference on Computational Linguistics and Intelligent Text Processing, Springer, pp. 185-194. [ Links ]

19. Wiebe, J. & Riloff, E. (2005). Creating subjective and objective sentence classifiers from unannotated texts. International conference on intelligent text processing and computational linguistics, Springer, pp. 486-497. [ Links ]

20. Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005). Opinionfinder: A system for subjectivity analysis. Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, pp. 34-35. [ Links ]

¹Available in http://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl

²Available in http://www.adampease.org/OP/

³Available in http://mpqa.cs.pitt.edu/opinionfinder/opinionfinder_2/

⁴This tool was used because shows good results in the Subjectivity Detection task.

Received: February 14, 2019; Accepted: March 04, 2019

^* Corresponding author is Rodrigo Lopez. a20112387@pucp.pe.

This is an open-access article distributed under the terms of the Creative Commons Attribution License