Scielo RSS <![CDATA[Computación y Sistemas]]> vol. 22 num. 1 lang. en <![CDATA[SciELO Logo]]> <![CDATA[Introduction to the Thematic Issue on Language & Knowledge Engineering]]> <![CDATA[Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms]]> Abstract: Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC), workshops. Since DUC01, several methods have been presented in approximately 268 publications of the state-of-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best multi-document summarization methods. In this paper, we describe a Genetic Algorithm-based method for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a Meta-document representation. Moreover, we have calculated three heuristics mentioned in several works of state-of-the-art to rank the most recent MDS methods, through the calculus of upper bounds and lower bounds. <![CDATA[Idiom Polarity Identification using Contextual Information]]> Abstract: Identifying the polarity of a given text is a complex task that usually requires an analysis of the contextual information. This task becomes to be much more complex when, in such analysis, we consider smaller textual components than paragraphs, such as sentences, phraseological units or single words. In this paper, we consider the automatic identification of polarity for linguistic units known as idioms based on their contextual information. Idioms are a phraseological unit made up of more than two words in which one of those words plays the role of the predicate. We employ three lexicons for determining the polarity of those words surrounding the idiom, i.e., in its context and using this information we infer the possible polarity of the target idiom. The lexicons we are using are: ElhPolar dictionary, iSOL and ML-SentiCON Sentiment Spanish Lexicon, all of them containg the polarity of different words. One of the aims of this research work is to identify the lexicon that provides the best results for the task proposed, which is to count the number of positive and negative words in the idiom context, so that we can infer the polarity of the idiom itself. The experiments carried out show that the best combination obtain results close to 57.31%, when the texts are lemmatized and 48.87%, when they are not lemmatized. <![CDATA[Analyzing Polemics Evolution from Twitter Streams Using Author-Based Social Networks]]> Abstract: The construction of social network graphs from online networks data has become nowadays a common track to analyze these data. Typical research questions in this domain are related to profile building, interest’s recommendation, and trending topics prediction. However, few work has been devoted to the analysis of the evolution of very short and unpredictable events, called polemics. Also, experts do not use tools coming from social network graphs analysis and classical graph theory for this analysis. In this way, this article shows that such analysis lead to a colossal amount of data collected from public social sources like Twitter. The main problem is collecting enough evidences about a non-predictable event, which requires capturing a complete history before and during the course of this event, and processing them. To cope with this problem, while waiting for an event, we captured social data without filtering it, which required more than a TB of disk space. Then, we conduct a time-related social network analysis. The first one is dedicated to the study of the evolution of the actor interactions, using time-series built from a total of 33 graph theory metrics. A Big Data pipeline allows us to validate these techniques on a complex dataset of 284 millions of tweets, analyzing 56 days of the Volkswagen scandal [12]. <![CDATA[Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts]]> Abstract: In this paper, we present an approach to identify changes in the writing style of 7 authors of novels written in English. We defined 3 stages of writing for each author, each stage contains 3 novels with a maximum of 3 years between each publication. We propose several stylometric features to represent the novels in a vector space model. We use supervised learning algorithms to determine if by means of this stylometric-based representation is possible to identify to which stage of writing each novel belongs. <![CDATA[Extraction of Code-mixed Aspect Topics in Semantic Representation]]> Abstract: With recent advancements and popularity of social networking forums, millions of people virtually connected to the World Wide Web, commonly communicate in multiple languages. This has led to the generation of large volumes of unstructured code-mixed social media text having useful aspects of information highly dispersed. Aspect based opinion mining relates opinion targets to their polarity values, in a specific context. It is known that since aspects are often implicit, detecting and retrieving them is a difficult task. Moreover, it is very challenging as the code-mixed social media text suffers from its associated linguistic complexities. As a standard, topic modeling has a potential of extracting aspects pertaining to opinion data from large text. This results not only in retrieval of implicit aspects but also in clustering them together. In this paper we propose knowledge based language independent code-mixed semantic LDA (lcms-LDA) model, with an aim to improve the coherence of clusters. We find that the proposed lcms-LDA model infers topic distributions without language barrier, based on semantics associated with words. Our experimental results showed an increase in the UMass and KL divergence score indicating an improved performance in the resulting coherence and distinctiveness of aspect clusters in comparison with the state-of-the-art techniques used for aspect extraction of code-mixed data. <![CDATA[Character Embedding for Language Identification in Hindi-English Code-mixed Social Media Text]]> Abstract: Social media platforms are now widely used by the people to express their opinion or interest. The language used by the users in social media earlier was purely English. Code-mixed text, i.e., mixing of two or more languages, is commonly seen now. In code-mixed data, one language will be written using another language script. So to process such code-mixed text, identification of language used in each word is important for language processing. The main objective of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. The classification of Hindi-English code-mixed data into Hindi, English, Named Entity, Acronym, Universal, Mixed (Hindi along with English) and Undefined tags were performed. Popular word embedding features were used for the representation of each word. Two kinds of embedding features were considered - word-based embedding features and character-based context features. The proposed method was done with the addition of context information along with the embedding features. A well-known machine learning classifier, Support Vector Machine was used to train and test the system. The work on Language Identification in code-mixed text using character-based embedding is a novel approach and shows promising results. <![CDATA[A Heuristic Approach to Detect and Localize Text in Arabic News Video]]> Abstract: Automatic text detection in video sequences remains a challenging problem due to the variety of sizes, colors and the presence of complex background. In this paper, we attempt to solve this problem by proposing a robust detection-validation schema for text localization in Arabic news video. Candidate text regions are first detected by using a hybrid method which combines MSER detector and edge information. Then, these regions are grouped using morphological operators. Finally, a verification process is applied to remove noisy non-text regions including specific features for Arabic text. Performance and efficacy of the proposed text detection approach have been tested By using Arabic-Text-in-Video database (AcTiV-DB). <![CDATA[Generating Aspect-based Extractive Opinion Summary: Drawing Inferences from Social Media Texts]]> Abstract: This paper presents an integrated framework to generate extractive aspect-based opinion summary from a large volume of free-form text reviews. The framework has three major components: (a) aspect identifier to determine the aspects in a given domain; (b) sentiment polarity detector for computing the sentiment polarity of opinion about an aspect; and (c) summary generator to generate opinion summary. The framework is evaluated on SemEval-2014 dataset and obtains better results than several other approaches. <![CDATA[New Similarity Function for Scientific Articles Clustering based on the Bibliographic References]]> Abstract: The amount of scientific information available on the Internet, corporate intranets, and other media is growing rapidly. Managing knowledge from the information that can be found in scientific publications is essential for any researcher. The management of scientific information is increasingly more complex and challenging, since documents collections are generally heterogeneous, large, diverse and dynamic. Overcoming these challenges is essential to give to the scientists the best conditions to manage the time required to process scientific information. In this work, we implemented a new similarity’s function for scientific articles' clustering in based on the information provided by the references of the articles. The use of this function contributes significantly to discover relevant knowledge from scientific literature. <![CDATA[Inferences for Enrichment of Collocation Databases by Means of Semantic Relations]]> Abstract: A text consists of words that are syntactically linked and semantically combinable—like “political party,” “pay attention,” or “stone cold.” Such semantically plausible combinations of two content words, which we hereafter refer to as collocations, are important knowledge in many areas of computational linguistics. We present the structure of a lexical resource that provides such knowledge—a collocation database (CBD). Since such databases cannot be complete under any reasonable compilation procedure, we consider heuristic-based inference mechanisms that predict new plausible collocations based on the ones present in the CDB, with the help of a WordNet-like thesaurus: if an available collocation combines the entries A and B, and B is ‘similar’ to C, then A and C are supposed to constitute a collocation of the same category. Also, we describe the semantically induced morphological categories suiting for such inference, as well as the heuristics for filtering out wrong hypotheses. We discuss the experience in inferences obtained with CrossLexica CDB. <![CDATA[Automatic Theorem Proving for Natural Logic: A Case Study on Textual Entailment]]> Abstract: Recognizing Textual Entailment (RTE) is a Natural Language Processing task. It is very important in tasks as Semantic Search and Text Summarization. There are many approaches to RTE, for example, methods based on machine learning, linear programming, probabilistic calculus, optimization, and logic. Unfortunately, no one of them can explain why the entailment is carried on. We can make reasonings, with Natural Logic, from the syntactic part of a natural language expression, and very little semantic information. This paper presents an Automatic Theorem Prover for Natural Logic that allows to know precisely the relationships needed in order to reach the entailment in a class of natural language expressions. <![CDATA[An Overview of Ontology Learning Tasks]]> Abstract: Ontology Learning (OL), for the Semantic Web has become widely used for knowledge representation. Therefore, the success of the Semantic Web depends strongly on the proliferation of ontologies, which requires fast and sound ontologies engineering learning process in order to provide an efficient knowledge acquisition service. The vision of ontology learning includes a number of complementary disciplines whose feed on different types of unstructured, semi-structured and fully structured data in order to support a semi-automatic, cooperative ontology engineering process. This article presents a general review of work related to types and tasks involving OL. These works consider fundamental types of Ontology Learning, schema extraction, creation and population, besides of evaluation methods and tools. <![CDATA[A metric for the Evaluation of Restricted Domain Ontologies]]> Abstract: In this article we propose a metric for the automatic evaluation of restricted domain ontologies. The metric is defined in terms of the evaluation of different lexico-syntactic, statistical and semantic approaches. A syntactic approach employed is the use of lexical syntactic patterns, other approaches as grouping by formal concept analysis, similarity, latent semantic analysis and dependence graphs are used as well. These approaches focus on reference corpora to find evidence of the validity of concepts and semantic relationships stored in the target ontology. The proposed evaluation approach is able to provide a score obtained through the metric, which is based on the accuracy measure used for each ontology evaluated. The score is associated in some way with the ontology quality. This score is given with a certain degree of reliability, and it is obtained by comparing the results given against the evaluation of human experts and a baseline. <![CDATA[A Workflow Ontology to Support Knowledge Management in a Group’s Organizational Structure]]> Abstract: In CSCW (Computer Supported Cooperative Work), managing the group’s organizational structure allows to control how the group members communicate, collaborate, and coordinate, to achieve a common goal, in order to benefit an organization or a community. Consequently, establishing an appropriate model of this structure’s management is very important, as it can be used as a guide for implementing these kinds of systems. This modeling must be flexible enough, so that it can conform itself to changes within the group and to adjust to the different working styles of several groups, as well as to formally support a base of knowledge; helping to eradicate any ambiguity or redundancy. Therefore, this modeling must formally provide a knowledge representation in order to specify the elements and to control the set of orderly steps on an organizational structure. Thus, a workflow ontology to control such a structure is proposed in this paper. Since, the workflow manages and controls the process, via a set of steps ordered and executed by different organization entities, whereas the ontology specifies the domain of knowledge through concepts, relations, axioms, and instances in a formal, explicit, way. A case of study, to demonstrate the knowledge management of the group’s organizational structure, through workflow ontology is shown. <![CDATA[A Neighborhood Combining Approach in GRASP's Local Search for Quadratic Assignment Problem Solutions]]> Abstract: In this paper we describe a study for the search of solutions of the combinatorial optimization problem Quadratic Assignment Problem (QAP) through the implementation of a Greedy Randomized Adaptive Procedure Search (GRASP) and have been compared with the best solutions known in the literature, obtaining robust results in terms of the value of the objective function and the execution time. Also a comparison with the ant algorithm is presented with the aim of comparing the meta-heuristic. The most important contribution of this paper is the use of the combination of different neighborhood structures in the GRASP improvement phase. The experiment was performed for a set of test instances available in QAPLIB. The QAP belongs to the Np-hard class whereby this approximation algorithm is implemented <![CDATA[Depth-First Reasoning on Trees]]> Abstract: The μ-calculus is an expressive modal logic with least and greatest fixed-point operators. This formalism encompasses many temporal, program and description logics, and it has been widely applied in a broad range of domains, such as, program verification, knowledge representation and concurrent pervasive systems. In this paper, we propose a satisfiability algorithm for the μ-calculus extended with converse modalities and interpreted on unranked trees. In contrast with known satisfiability algorithms, our proposal is based on a depth-first search. We prove the algorithm to be correct (sound and complete) and optimal. We also describe an implementation. The extension of the μ-calculus with converse modalities allows to efficiently characterize standard reasoning problems (emptiness, containment and equivalence) of XPath queries. We also describe several query reasoning experiments, which shows our proposal to be competitive in practice with known implementations. <![CDATA[Application of Multi-Criteria Decision Analysis to the Selection of Software Measures]]> Abstract: In this research we propose the application of a multi-criteria decision analysis to make documented and transparent decisions about software measures’ selection. The Pareto’s dominance method was utilized to narrow down the initial measures’ list. The multi-attribute value theory was applied for ranking the final set of measures. As a result there was eliminated about 40% of the initial measures and the final measures’ list was ranked. <![CDATA[Integration of Visualization Techniques to Algorithms of Optimization of the Metaheuristics Ant Colony]]> Abstract: The search guided by a user contributes to solving optimization problems. No adequate mechanisms for algorithms that use the metaheuristic Ant Colony (ACO), to achieve this interaction are known. This paper proposes a model of integration of visualization techniques in these algorithms that allows the user to interact with real-time search and guide her. A software tool was implemented to solve Traveling Salesman Problem (TSP), with ACO algorithm according to the proposed model. An experimental analysis with the developed tool was performed and the results showed the efficiency of the model, finding better solutions to problems TSP in less time. <![CDATA[Interactive System for the Analysis of Academic Achievement at the Upper-Middle Education in Mexico]]> Abstract: In recent years, there is an interest to find new ways to analyze and process data from different sources. One of these ways is through user-centered data mining, based on the fundamentals of the usability engineering and accessibility. The academic achievement, at Language and Communication and Mathematics, of students at the upper-middle education in Mexico was analyzed through a partitional clustering algorithm. A variety of academic achievements were observed, highlighting Insufficient and Elementary in the evaluated population, while Good and Excellent achievements were achieved by a reduced number of schools. This contrasts a notable difference between the achievements of the students, leading them to delay or stop their university studies because they obtain a certificate without the knowledge to pass the college entrance exams. <![CDATA[A Storage Pattern-based Heuristic Algorithm for Solving Instances of Hard28 Datasets for the Bin Packing Problem]]> Abstract: In this paper, we propose a heuristic algorithm that obtains the optimal solution for 5 instances of the set of instances Hard28, for the problem of packing objects in containers of a dimension (1DBPP). This algorithm is based on storage patterns of objects in containers. To detect how objects are stored in containers, the HGGA-BP algorithm [8] was used. A tool for monitoring and analyzing the HGGA-BP algorithm was also developed. With the help of the user, this tool performs the monitoring and analysis of the intermediate solutions that are generated with the algorithm HGGA-BP [8]. The proposed algorithm uses the inherent characteristics of the objects, that is, the weight value of the objects of the set of instances Hard28 can be: a prime number, an even number or an odd number. As well as, the weights of some of the objects are bigger than half of the capacity of the containers. The set Hard28 consists of 28 instances and the optimal value was found in 5 of them. For 19 instances, a container is missing to reach the optimum solution. For 3 instances, two containers were missing to reach the optimal solution and in one of the obtained solutions, 3 containers were missing to reach the optimal solution. For each of the optimal solutions found, the calculated time is less or equal than one millisecond. <![CDATA[Experimental Platform for Intelligent Computing (EPIC)]]> Abstract: This paper presents the architecture and user interface of a novel Experimental Platform for Intelligent Computing (EPIC). Unlike the two most popular platforms (WEKA and KEEL), the proposed EPIC tool has a very friendly user interface, and offers some advantages with respect to existing tools for Intelligent Computing experiments. In particular, EPIC handles mixed and incomplete data directly, without preprocessing, and its architecture supports multi-target supervised classification and regression. It also contains a module for two dimensional dataset visualization, which includes the visualization of the decision frontier for several supervised learning algorithms. <![CDATA[Is Natural User Interaction Really Natural? An Evaluation of Gesture-Based Navigating Techniques in Virtual Environments]]> Abstract: Many interaction techniques have been developed for virtual worlds including the use of novel devices. Nowadays, technological development has placed us in a time where the interaction devices are no longer available just to high technology laboratories. In this context, today we can develop solutions for natural user interfaces and its massive adoption presents research challenges. In this paper we analyze the use of gesture-based interaction for the navigation of virtual worlds. For them we have created a virtual world and contrasted the use of interactive interfaces based on gesture of hands or body, as well as interaction based on mouse and keyboard. The results found indicate that the natural is not as it is even though we imitate what we do in real life. <![CDATA[Recognition and Classification of Sign Language for Spanish]]> Abstract: In this paper it is presented a computational system for recognition and classification of letters of the sign language in Spanish, designed for helping deaf-mute people to communicate with other persons. A low-cost glove that captures the hand movements has been constructed. This one contains an accelerometer for each finger which allows detecting its position by using an acquisition data board. Sensor information is sent wirelessly to a computer having a software interface, developed in LabVIEW, in which the symbols dataset is generated. For the automatic recognition of letters we have applied a statistical treatment to the dataset obtaining accuracy greater than 96% independently of the user. <![CDATA[Visualization in a Data Mining Environment from a Human Computer Interaction Perspective]]> Resumen: Con el propósito de apoyar el proceso de diseño, análisis y evaluación de los mecanismos de visualización de resultados que provean información, en entornos de Minería de Datos, este trabajo trata la problemática de la visualización, desde un escenario de Interacción Humano Computador. En él se consideran aspectos relevantes surgidos del estudio de la percepción humana. Se describen tres ejemplos prácticos, sobre datos numéricos, de texto y georreferenciados, valiéndose de la herramienta KNIME Analytics. Asimismo, se expone la utilidad e importancia de los gráficos para una correcta interpretación de la información.<hr/>Abstract: With the aim of providing support to the design, analysis and evaluation of result visualization mechanisms used to supply information in Data Mining Environments, this work analyzes the visualization issue from a Human Computer Interaction setting. Important considerations arisen from the study of human perception are considered. Three practical examples based on numerical, textual and georeferenced data are described by means of the KNIME Analytics tool. In addition, the use and importance of graphs are emphasized for a correct information interpretation. <![CDATA[Microcalcifications Detection using Image Processing]]> Abstract: Breast cancer is the most common cause of death in women and the second leading cause of cancer deaths worldwide. Primary prevention in the early stages of the disease becomes complex as the causes remain almost unknown. However, some typical signatures of this disease, such as masses and microcalcifications appearing on mammograms, can be used to improve early diagnostic techniques, which is critical for women’s quality of life. X-ray mammography is the main test used for screening and early diagnosis, and its analysis and processing are the keys to improving breast cancer prognosis. In this work, an effective methodology to detect microcalcifications in digitized mammograms is presented. This methodology is based on the synergy of image processing, pattern recognition and artificial intelligence. The methodology consists in four stages: image selection, image enhancement and feature extraction based on mathematical morphology operations applying coordinate logic filters, image segmentation based on partitional clustering methods such as k-means and self organizing maps and finally a classifier such as an artificial metaplasticity multilayer perceptron. The proposed system constitutes a promising approach for the detection of Microcalcifications. The experimental results show that the proposed methodology can locate Microcalcifications in an efficient way. The best values obtained in the experimental results are: accuracy 99.93% and specificity 99.95%, These results are very competitive with those reported in the state of the art. <![CDATA[Vision System for the Navigation of a Mobile Robot]]> Abstract: In this paper the development of an object detection system in a controlled two-dimensional space using computer vision techniques is presented. The detected objects have a rigid geometry and are exposed to real light; therefore, the system is robust to changes in lighting and shading. In order to handle the large amount of data to be processed in real time, a MyRIO device which contains an FPGA is used. This device allows communication with the LabVIEW software where the user interface resides. Using LabVIEW a tracking by color algorithm is implemented, in order to attend to a reactive agent, which uses an infrared sensor to detect the distance to an obstacle and perform the functions of foraging and storage. In order to improve performance, a supervisory system was implemented using a Kinect device that provides information relative to the position of the objects in the test area. This information allows eliminating occlusion problems.