Scielo RSS <![CDATA[Polibits]]> vol. num. 54 lang. en <![CDATA[SciELO Logo]]> <![CDATA[Editorial]]> <![CDATA[Filtering Compromised Environment Sensors Using Autoregressive Hidden Markov Model]]> Abstract: We propose a method based on autoregressive hidden Markov models (AR-HMM) for filtering out compromised nodes from a sensor network. We assume that sensors are healthy, self-healing and corrupted whereas each node submits a number of readings. A different AR-HMM (A, B, π) is used to describe each of the three types of nodes. For each node, we train an AR-HMM based on the sensor's readings, and subsequently the B matrices of the trained AR-HMMs are clustered together into two groups: healthy and compromised (both self-healing and corrupted), which permits us to identify the group of healthy sensors. The existing algorithms are centralized and computation intensive. Our approach is a simple, decentralized model to identify compromised nodes at a low computational cost. Simulations using both synthetic and real datasets show greater than 90% accuracy in identifying healthy nodes with ten nodes datasets and as high as 97% accuracy with 500 or more nodes datasets. <![CDATA[Robust Spoken Language Understanding for House Service Robots]]> Abstract: Service robotics has been growing significantly in the last years, leading to several research results and to a number of consumer products. One of the essential features of these robotic platforms is represented by the ability of interacting with users through natural language. Spoken commands can be processed by a Spoken Language Understanding chain, in order to obtain the desired behavior of the robot. The entry point of such a process is represented by an Automatic Speech Recognition (ASR) module, that provides a list of transcriptions for a given spoken utterance. Although several well-performing ASR engines are available off-the-shelf, they operate in a general purpose setting. Hence, they may be not well suited in the recognition of utterances given to robots in specific domains. In this work, we propose a practical yet robust strategy to re-rank lists of transcriptions. This approach improves the quality of ASR systems in situated scenarios, i.e., the transcription of robotic commands. The proposed method relies upon evidences derived by a semantic grammar with semantic actions, designed to model typical commands expressed in scenarios that are specific to human service robotics. The outcomes obtained through an experimental evaluation show that the approach is able to effectively outperform the ASR baseline, obtained by selecting the first transcription suggested by the ASR. <![CDATA[PC Based Open Control Architecture for Mechatronic Systems]]> Resumen: En este artículo se presenta el diseño de una arquitectura de control abierta por medio de una computadora personal (personal computer "PC" en inglés) para sistemas mecatrónicos de bajo costo, con la flexibilidad, reconfigurabilidad y versatilidad para realizar una amplia variedad de tareas de manera sencilla. Esta arquitectura se puede utilizar de forma didáctica en las escuelas para la enseñanza teórico-práctica en algunos cursos de ingeniería y posgrado; y además puede ser usada para la investigación al implementar diversas estrategias de control en el sistema real, para así reducir el tiempo empleado en la implementación experimental. Se muestra el desempeño de la arquitectura de control propuesta al comparar los resultados en simulación y experimental en un robot SCARA con un controlador par calculado para el seguimiento de trayectoria.<hr/>Abstract: In this paper, an open control architecture for mechatronic systems is designed based on a personal computer (PC). This architecture is a low cost one with the flexibility, reconfigurability and versatility for carrying out a broad variety of tasks in a simple manner. This architecture can provide theoretical and practical teaching for some courses in engineering and postgraduate studies. In addition, this architecture can be useful in research for fast experimental implementation of diverse control laws. The simulation and experimental results show the performance of the open control architecture in a SCARA robot with a computed torque control for trajectory tracking. <![CDATA[Business Process Models Clustering Based on Multimodal Search, K-means, and Cumulative and No-Continuous N-Grams]]> Abstract: Due to the large volume of process repositories, finding a particular process may become a difficult task. This paper presents a method for indexing, search, and grouping business processes models. The method considers linguistic and behavior information for modeling the business process. Behavior information is described using cumulative and no-continuous n-grams. Grouping method is based on k-means algorithm and suffix arrays to define labels for each group. The clustering approach incorporates mechanisms for avoiding overlapping and improve the homogeneity of the created groups using the K-means algorithm. Obtained results outperform the precision, recall and F-measure of previous approaches. <![CDATA[Cross-Language Information Retrieval with Incorrect Query Translations]]> Abstract: In this paper, we present a Cross Language Information Retrieval (CLIR) approach using corpus driven query suggestion. We have used corpus statistics to gather a clue on selecting the right query terms when the translation of a specific query is missing or incorrect. The derived set of queries are ranked to select the top ranked queries. These top ranked queries are further used to perform query formulation. Using the re-formulated weighted query, we perform cross language information retrieval. The results are compared with the results of CLIR system with Google translation of user queries and CLIR with the proposed query suggestion approach. We have English and Tamil corpus of FIRE 2012 dataset and analyzed the effects of the proposed approach. The experimental results show that the proposed approach performs well with the incorrect translation of the queries. <![CDATA[Unsupervised Word Sense Disambiguation Using Alpha-Beta Associative Memories]]> Abstract: We present an alternative method to the use of overlapping as a distance measure in simple Lesk algorithm. This paper presents an algorithm that uses Alpha-Beta associative memory type Max and Min to measure a given ambiguous word's meaning in relation to its context, assigning to the word the meaning that is most related. The principal advantage of using this algorithm is the ability to deal with inflectional and derivational forms of words, enabling the possibility of bypassing the stemming procedure of words involved in the disambiguation process. Different experiments were performed, with two parameters as variables: the context window size, and whether stemming was applied or not. The experimental results (F1-score) show that our algorithm performs better than the use of the overlapped metric in the simple Lesk algorithm. Moreover, the experiments show that as more information is added to the sense or meaning, and the overlap metric is used, the precision of the simple Lesk algorithm is decreased-in contrast to the performance of our algorithm. <![CDATA[A Method Based on Genetic Algorithms for Generating Assessment Tests Used for Learning]]> Abstract: Tests are used in a variety of contexts in the activity of everyday and everywhere learning. They are a specific method in the process of assessment (evaluation), which is an important part of the educational activity. Setting an optimized sequence of tests (SOT) originating from a group of tests which have the same subject, with certain restrictions corresponding to a certain wish of the evaluator can be a slowly time-consuming task, because the restriction can be various and the number of tests can be high. In this matter, this paper presents a method of generating optimized sequences of tests within a battery of tests using a genetic algorithm. We associate a number of representative keywords with a test. The user expresses the restriction by setting up a number of keywords which approximate best the subject wanted to be tested. The genetic algorithm helps in finding the optimized solutions and uses a less amount of hardware resources. <![CDATA[IN-DEDUCTIVE and DAG-Tree Approaches for Large-Scale Extreme Multi-label Hierarchical Text Classification]]> Abstract: This paper presents a large-scale extreme multilabel hierarchical text classification method that employs a large-scale hierarchical inductive learning and deductive classification (IN-DEDUCTIVE) approach using different efficient classifiers, and a DAG-Tree that refines the given hierarchy by eliminating nodes and edges to generate a new hierarchy. We evaluate our method on the standard hierarchical text classification datasets prepared for the PASCAL Challenge on Large-Scale Hierarchical Text Classification (LSHTC). We compare several classification algorithms on LSHTC including DCD-SVM, SVMper f, Pegasos, SGD-SVM, and Passive Aggressive, etc. Experimental results show that IN-DEDUCTIVE approach based systems with DCD-SVM, SGD-SVM, and Pegasos are promising and outperformed other learners as well as the top systems participated in the LSHTC3 challenge onWikipedia medium dataset. Furthermore, DAG-Tree based hierarchy is effective especially for very large datasets since DAG-Tree exponentially reduce the amount of computation necessary for classification. Our system with IN-DEDUCIVE and DAG-Tree approaches outperformed the top systems participated in the LSHTC4 challenge on Wikipedia large dataset. <![CDATA[Instance Selection to Improve Gamma Classifier]]> Abstract: Pre-processing the dataset is an important stage in the Knowledge Discovery in Datasets (KDD) process. Filtering noise through instance selection is a necessary task. With this, the risk to use misclassified and non-representative instances to train supervised classifiers is reduced. This study aims at improving the performance of the Gamma associative classifier, by introducing a novel similarity function to guide instance selection. The experimental results, over 15 datasets, include several instance selection methods, and their influence in the performance of Gamma classifier is analyzed. The effectiveness of the proposed similarity function is tested, obtaining good results according to classifier accuracy and instance retention ratio. <![CDATA[Understanding Human Preferences for Summary Designs in Online Debates Domain]]> Abstract: Research on automatic text summarization has primarily focused on summarizing news, web pages, scientific papers, etc. While in some of these text genres, it is intuitively clear what constitutes a good summary, the issue is much less clear cut in social media scenarios like online debates, product reviews, etc., where summaries can be presented in many ways. As yet, there is no analysis about which summary representation is favored by readers. In this work, we empirically analyze this question and elicit readers' preferences for the different designs of summaries for online debates. Seven possible summary designs in total were presented to 60 participants via online channels. Participants were asked to read and assign preference scores to each summary design. The results indicated that the combination of Chart Summary and Side-By-Side Summary is the most preferred summary design. This finding is important for future work in automatic text summarization of online debates.