Scielo RSS <![CDATA[ComputaciĆ³n y Sistemas]]> vol. 18 num. 3 lang. en <![CDATA[SciELO Logo]]> <![CDATA[<b>Editorial</b>]]> <![CDATA[<b>Why Has Artificial Intelligence Failed? And How Can it Succeed?</b>]]> In the 1960s, pioneers in artificial intelligence made grand claims that AI systems would surpass human intelligence before the end of the 20th century. Except for beating the world chess champion in 1997, none of the other predictions have come true. But AI research has contributed a huge amount of valuable technology, which has proved to be successful on narrow, specialized problems. Unfortunately, the field of AI has fragmented into those narrow specialties. Many researchers claim that their specialty is the key to solving all the problems. But the true key to AI is the knowledge that there is no key. Human intelligence comprises every specialty that anyone in any culture or civilization has ever dreamed of. Each one is adequate for a narrow range of applications. The power of human intelligence comes from the ability to relate, combine, and build on an open-ended variety of methods for different applications. Successful AI systems require a framework that can support any and all such combinations. <![CDATA[<b>Structural Isomorphism of Meaning and Synonymy</b>]]> In this paper I am going to deal with the phenomenon of synonymy from the logical point of view. In Transparent Intensional Logic (TIL), which is my background theory, the sense of an expression is an algorithmically structured procedure detailing what operations to apply to what procedural constituents to arrive at the object (if any) denoted by the expression. Such procedures are rigorously defined as TIL constructions. In this new orthodoxy of structured meanings and procedural semantics we encounter the problem of the granularity of procedure individuation. Though the identity of TIL constructions is rigorously defined, they are a bit too fine-grained from the procedural point of view. In an effort to solve the problem we introduced the notion of procedural isomorphism. Any two terms or expressions whose respective meanings are procedurally isomorphic are deemed semantically indistinguishable, hence synonymous and thus substitutable in any context, whether extensional, intensional or hyperintensional. The novel contribution of this paper is a formally worked-out, philosophically motivated criterion of hyperintensional individuation, which is defined in terms of a slightly more carefully formulated version of α-conversion and β-conversion by value, which amounts to a modification of Church's Alternative (A1). <![CDATA[<b>Inferring Relations and Annotations in Semantic Network</b>: <b>Application to Radiology</b>]]> Domain-specific ontologies are invaluable despite many challenges associated with their development. In most cases, domain knowledge bases are built with very limited scope without considering the benefits of plunging domain knowledge to a general ontology. Furthermore, most existing resources lack meta-information about association strength (weights) and annotations (frequency information like frequent, rare, etc. or relevance information like pertinent or irrelevant). In this paper, we present a semantic resource for radiology built over an existing general semantic lexical network (JeuxDeMots). This network combines weight and annotations on typed relations between terms and concepts. Some inference mechanisms are applied to the network to improve its quality and coverage. We extend this mechanism to relation annotation. We describe how annotations are handled and how they improve the network by imposing new constraints especially those founded on medical knowledge. <![CDATA[<b>Spotting Fake Reviews using Positive-Unlabeled Learning</b>]]> Fake review detection has been studied by researchers for several years. However, so far all reported studies are based on English reviews. This paper reports a study of detecting fake reviews in Chinese. Our review dataset is from the Chinese review hosting site Dianping, which has built a fake review detection system. They are confident that their algorithm has a very high precision, but they don't know the recall. This means that all fake reviews detected by the system are almost certainly fake but the remaining reviews may not be all genuine. This paper first reports a supervised learning study of two classes, fake and unknown. However, since the unknown set may contain many fake reviews, it is more appropriate to treat it as an unlabeled set. This calls for the model of learning from positive and unlabeled examples (or PU-learning). Experimental results show that PU learning not only outperforms supervised learning significantly, but also detects a large number of potentially fake reviews hidden in the unlabeled set that Dianping fails to detect. <![CDATA[<b>Using Multi-View Learning to Improve Detection of Investor Sentiments on Twitter</b>]]> Stock-related messages on social media have several interesting properties regarding the sentiment analysis (SA) task. On the one hand, the analysis is particularly challenging, because of frequent typos, bad grammar, and idiosyncratic expressions specific to the domain and media. On the other hand, stock-related messages primarily refer to the state of specific entities - companies and their stocks, at specific times (of sending). This state is an objective property and even has a measurable numeric characteristic, namely, the stock price. Given a large dataset of twitter messages, we can create two separate "views" on the dataset by analyzing text of messages and external properties separately. With this, we can expand the coverage of generic SA tools and learn new sentiment expressions. In this paper, we experiment with this learning method, comparing several types of general SA tools and sets of external properties. The method is shown to produce significant improvement in accuracy. <![CDATA[<b>Soft Similarity and Soft Cosine Measure</b>: <b>Similarity of Features in Vector Space Model</b>]]> We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data. We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words "play" and "game" are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call "soft cosine measure". We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams. <![CDATA[<b>SIMTEX</b>: <b>An Approach for Detecting and Measuring Textual Similarity based on Discourse and Semantics</b>]]> Nowadays automatic systems for detecting and measuring textual similarity are being developed, in order to apply them to different tasks in the field of Natural Language Processing (NLP). Currently, these systems use surface linguistic features or statistical information. Nowadays, few researchers use deep linguistic information. In this work, we present an algorithm for detecting and measuring textual similarity that takes into account information offered by discourse relations of Rhetorical Structure Theory (RST), and lexical-semantic relations included in EuroWordNet. We apply the algorithm, called SIMTEX, to texts written in Spanish, but the methodology is potentially language-independent. <![CDATA[<b>Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition</b>]]> Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this problem, several lexical, syntactic and semantic based techniques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntactic dependency and constituent n-grams combined with common NLP techniques such as stemming, synonym detection, similarity measures, and linear combination and a similarity matrix built in turn from syntactic n-grams. We measure and compare the performance of our system by using the Microsoft Research Paraphrase Corpus. An in-depth research is presented in order to present the strengths and weaknesses of each approach, as well as a common error analysis section. Our main motivation was to determine which syntactic approach had a better performance for this task: syntactic dependency n-grams, or syntactic constituent n-grams. We compare too both approaches with traditional n-grams and state-of-the-art systems. <![CDATA[<b>Paraphrase and Textual Entailment Generation in Czech</b>]]> Paraphrase and textual entailment generation can support natural language processing (NLP) tasks that simulate text understanding, e.g., text summarization, plagiarism detection, or question answering. A paraphrase, i.e., a sentence with the same meaning, conveys a certain piece of information with new words and new syntactic structures. Textual entailment, i.e., an inference that humans will judge most likely true, can employ real-world knowledge in order to make some implicit information explicit. Paraphrases can also be seen as mutual entailments. We present a new system that generates paraphrases and textual entailments from a given text in the Czech language. First, the process is rule-based, i.e., the system analyzes the input text, produces its inner representation, transforms it according to transformation rules, and generates new sentences. Second, the generated sentences are ranked according to a statistical model and only the best ones are output. The decision whether a paraphrase or textual entailment is correct or not is left to humans. For this purpose we designed an annotation game based on a conversation between a detective (the human player) and his assistant (the system). The result of such annotation is a collection of annotated pairs text-hypothesis. Currently, the system and the game are intended to collect data in the Czech language. However, the idea can be applied for other languages. So far, we have collected 3,321 H-T pairs. From these pairs, 1,563 were judged correct (47.06 %), 1,238 (37.28 %) were judged incorrect entailments, and 520 (15.66 %) were judged non-sense or unknown. <![CDATA[<b>Vector Space Basis Change in Information Retrieval</b>]]> The Vector Space Basis Change (VSBC) is an algebraic operator responsible for change of basis and it is parameterized by a transition matrix. If we change the vector space basis, then each vector component changes depending on this matrix. The strategy of VSBC has been shown to be effective in separating relevant documents and irrelevant ones. Recently, using this strategy, some feedback algorithms have been developed. To build a transition matrix some optimization methods have been used. In this paper, we propose to use a simple, convenient and direct method to build a transition matrix. Based on this method we develop a relevance feedback algorithm. Experimental results on a TREC collection show that our proposed method is effective and generally superior to known VSBC-based models. We also show that our proposed method gives a statistically significant improvement over these models. <![CDATA[<b>Multi-document Summarization using Tensor Decomposition</b>]]> The problem of extractive text summarization for a collection of documents is defined as selecting a small subset of sentences so the contents and meaning of the original document set are preserved in the best possible way. In this paper we present a new model for the problem of extractive summarization, where we strive to obtain a summary that preserves the information coverage as much as possible, when compared to the original document set. We construct a new tensor-based representation that describes the given document set in terms of its topics. We then rank topics via Tensor Decomposition, and compile a summary from the sentences of the highest ranked topics. <![CDATA[<b>Entity Extraction in Biochemical Text using Multiobjective Optimization</b>]]> In this paper we propose a multiobjective modified differential evolution based feature selection and classifier ensemble approach for biochemical entity extraction. The algorithm performs in two layers. The first layer concerns with determining an appropriate set of features for the task within the framework of a supervised statistical classifier, namely, Conditional Random Field (CRF). This produces a set of solutions, a subset of which is used to construct an ensemble in the second layer. The proposed approach is evaluated for entity extraction in chemical texts, which involves identification of IUPAC and IUPAC-like names and classification of them into some predefined categories. Experiments that were carried out on a benchmark dataset show the recall, precision and F-measure values of 86.15%, 91.29% and 88.64%, respectively. <![CDATA[<b>On-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications</b>]]> We describe a Chinese-Portuguese translation service, which is integrated in an Android application. The application is also enhanced with technologies such as Automatic Speech Recognition, Optical Character Recognition, Image Retrieval, and Language Detection. This mobile translation application, which is deployed on a portable device, relies by default on a server-based machine translation service, which is not accessible when no Internet connection is available. For providing translation support under this condition, we have developed a contextualized off-line search engine that allows the users to continue using the application. The system includes a search engine that is used to support our Chinese-Portuguese machine translation services when no Internet connection is available. <![CDATA[<b>Formal Description of Arabic Syntactic Structure in the Framework of the Government and Binding Theory</b>]]> The research focus in our paper is twofold: (a) to examine the extent to which simple Arabic sentence structures comply with the Government and Binding Theory (GB), and (b) to implement a simple Arabic Context Free Grammar (CFG) parser to analyze input sentence structures to improve some Arabic Natural Language Processing (ANLP) Applications. Here we present a parser that employs Chomsky's Government and Binding (GB) theory to better understand the syntactic structure of Arabic sentences. We consider different simple word orders in Arabic and show how they are derived. We analyze different sentence orders including Subject-Verb-Object (SVO), Verb-Object-Subject (VOS), Verb-Subject-Object (VSO), nominal sentences, nominal sentences beginning with inna (and sisters) and question sentences. We tackle the analysis of the structures to develop syntactic rules for a fragment of Arabic grammar. We include two sets of rules: (1) rules on sentence structures that do not account for case and (2) rules on sentence structures that account for case of Noun Phrases (NPs). We present an implementation of the grammar rules in Prolog. The experiments revealed high accuracy in case assignment in Modern Standard Arabic (MSA) in the light of GB theory especially when the input sentences are tagged with identification of end cases. <![CDATA[<b>Semantic Hyper-graph Based Representation of Nouns in the Kazakh Language</b>]]> We explain how semantic hyper-graphs are used to describe ontological models of morphological rules of agglutinative languages, with the Kazakh language as a case study. The vertices of these graphs represent morphological features and the edges represent relationships between these features. Such modeling allows nearly one to one translation of the morphology of the language into object-oriented model of data. In addition, with such a model we can easily generate new word forms. The constructed model and the dictionary generated with it are freely available for research purposes. <![CDATA[<b>Towards the Automatic Recommendation of Musical Parameters based on Algorithm for Extraction of Linguistic Rules</b>]]> In the present article the authors describe an analysis of data associated to the emotional responses to fractal generated music. This analysis is done via discovery of rules, and it constitutes the basis to elevate computer-assisted creativity: Our ultimate goal is to create musical pieces by retrieving the right set of parameters associated to a target emotion. This paper contains the description of (i) variables associated to fractal music and emotions; (ii) the data gathering method to obtain the tuples relating input parameters and emotional responses; (iii) the rules that where discovered by using an algorithm LR-FIR. Even though similar experiments whose intention is to elucidate emotional responses from music have been reported, this study stands because a connection is appointed between fractal-generated music and emotional responses, all with the purpose of advancing in computer-assisted creativity.