Scielo RSS <![CDATA[Polibits]]> vol. num. 47 lang. es <![CDATA[SciELO Logo]]> <![CDATA[<b>Preface</b>]]> <![CDATA[<b><i>N</i>-gram Parsing for Jointly Training a Discriminative Constituency Parser</b>]]> Syntactic parsers are designed to detect the complete syntactic structure of grammatically correct sentences. In this paper, we introduce the concept of n-gram parsing, which corresponds to generating the constituency parse tree of n consecutive words in a sentence. We create a stand-alone n-gram parser derived from a baseline full discriminative constituency parser and analyze the characteristics of the generated n-gram trees for various values of n. Since the produced n-gram trees are in general smaller and less complex compared to full parse trees, it is likely that n-gram parsers are more robust compared to full parsers. Therefore, we use n-gram parsing to boost the accuracy of a full discriminative constituency parser in a hierarchical joint learning setup. Our results show that the full parser jointly trained with an n-gram parser performs statistically significantly better than our baseline full parser on the English Penn Treebank test corpus. <![CDATA[<b>Automatic WordNet Construction Using Markov Chain Monte Carlo</b>]]> WordNet is used extensively as a major lexical resource in information retrieval tasks. However, the qualities of existing Persian WordNets are far from perfect. They are either constructed manually which limits the coverage of Persian words, or automatically which results in unsatisfactory precision. This paper presents a fully-automated approach for constructing a Persian WordNet: A Bayesian Model with Markov chain Monte Carlo (MCMC) estimation. We model the problem of constructing a Persian WordNet by estimating the probability of assigning senses (synsets) to Persian words. By applying MCMC techniques in estimating these probabilities, we integrate prior knowledge in the estimation and use the expected value of generated samples to give the final estimates. This ensures great performance improvement comparing with Maximum-Likelihood and Expectation-Maximization methods. Our acquired WordNet has a precision of 90.46% which is a considerable improvement in comparison with automatically-built WordNets in Persian. <![CDATA[<b>Exploration on Effectiveness and Efficiency of Similar Sentence Matching</b>]]> Similar sentence matching is an essential issue for many applications, such as text summarization, image extraction, social media retrieval, question-answer model, and so on. A number of studies have investigated this issue in recent years. Most of such techniques focus on effectiveness issues but only a few focus on efficiency issues. In this paper, we address both effectiveness and efficiency in the sentence similarity matching. For a given sentence collection, we determine how to effectively and efficiently identify the top-k semantically similar sentences to a query. To achieve this goal, we first study several representative sentence similarity measurement strategies, based on which we deliberately choose the optimal ones through cross-validation and dynamically weight tuning. The experimental evaluation demonstrates the effectiveness of our strategy. Moreover, from the efficiency aspect, we introduce several optimization techniques to improve the performance of the similarity computation. The trade-off between the effectiveness and efficiency is further explored by conducting extensive experiments. <![CDATA[<b>TopicSearch-Personalized Web Clustering Engine Using Semantic Query Expansion, Memetic Algorithms and Intelligent Agents</b>]]> As resources become more and more available on the Web, so the difficulties associated with finding the desired information increase. Intelligent agents can assist users in this task since they can search, filter and organize information on behalf of their users. Web document clustering techniques can also help users to find pages that meet their information requirements. This paper presents a personalized web document clustering called TopicSearch. TopicSearch introduces a novel inverse document frequency function to improve the query expansion process, a new memetic algorithm for web document clustering, and frequent phrases approach for defining cluster labels. Each user query is handled by an agent who coordinates several tasks including query expansion, search results acquisition, preprocessing of search results, cluster construction and labeling, and visualization. These tasks are performed by specialized agents whose execution can be parallelized in certain instances. The model was successfully tested on fifty DMOZ datasets. The results demonstrated improved precision and recall over traditional algorithms (k-means, Bisecting k-means, STC y Lingo). In addition, the presented model was evaluated by a group of twenty users with 90% being in favor of the model. <![CDATA[<b>Recommending Machine Translation Output to Translators by Estimating Translation Effort</b>: <b>A Case Study</b>]]> In this paper we use the statistics provided by a field experiment to explore the utility of supplying machine translation suggestions in a computer-assisted translation (CAT) environment. Regression models are trained for each user in order to estimate the time to edit (TTE) for the current translation segment. We use a combination of features from the current segment and aggregated features from formerly translated segments selected with content-based filtering approaches commonly used in recommendation systems. We present and evaluate decision function heuristics to determine if machine translation output will be useful for the translator in the given segment. We find that our regression models do a reasonable job for some users in predicting TTE given only a small number of training examples; although noise in the actual TTE for seemingly similar segments yields large error margins. We propose to include the estimation of TTE in CAT recommendation systems as a well-correlated metric for translation quality. <![CDATA[<b>Scene Boundary Detection from Movie Dialogue</b>: <b>A Genetic Algorithm Approach</b>]]> Movie scripts are a rich textual resource that can be tapped for movie content analysis. This article describes a mechanism for fragmenting a sequence of movie script dialogue into scene-wise groups. In other words, it attempts to locate scene transitions using information acquired from a sequence of dialogue units. We collect movie scripts from a web archive. Thereafter, we preprocess them to develop a resource of dialogues. We feed the dialogue sequence from a script to a Genetic Algorithm (GA) framework. The system fragments the sequence into adjacent groups of dialogue units or output 'scenes'. We use SentiWordnet scores and Wordnet distance for dialogue units to optimize this grouping so that adjacent scenes are semantically most dissimilar. Then we compare the resulting fragmented dialogue sequence with the original scene-wise alignment of dialogue in the script. <![CDATA[<b>Efficient Routing of Mobile Agents in a Stochastic Network</b>]]> Mobile agents are autonomous programs that may be dispatched through computer networks. Using a mobile agent is a potentially efficient method to perform transactions and retrieve information in networks. Unknown congestion in a network causes uncertainty in the routing times of mobile agents so the routing of mobile agents cannot rely solely on the average travel time. In this paper we deal with a given stochastic network in which the mobile agent routing time is a random variable. Given pre-specified values R and PR, the objective is to find the path with the minimum expected time under the constraint that the probability that the path time is less than R is at least PR. We show that this problem is NP-hard, and construct an exact pseudo-polynomial algorithm and an ε-approximation algorithm (FPTAS) for the problem. <![CDATA[<b>Differential Evolution for the Control Gain's Optimal Tuning of a Four-bar Mechanism</b>]]> In this paper the variation of the velocity error of a four-bar mechanism with spring and damping forces is reduced by solving a dynamic optimization problem using a differential evolution algorithm with a constraint handling mechanism. The optimal design of the velocity control for the mechanism is formulated as a dynamic optimization problem. Moreover, in order to compare the results of the differential evolution algorithm, a simulation experiment of the proposed control strategy was carried out. The simulation results and discussion are presented in order to evaluate the performance of both approaches in the control of the mechanism. <![CDATA[<b>Patrones de implementación para incluir comportamientos proactivos</b>]]> La programación orientada a objeto enfrenta retos como es el desarrollo de software en ambientes distribuidos. En esta línea ha surgido el paradigma de agentes. Un agente exhibe comportamientos que lo diferencia de un objeto, como la autonomía y la proactividad. La proactividad permite desarrollar sistemas dirigidos por metas, en los que no es necesaria una petición para que se inicie un trabajo. Incorporar proactividad a un software es hoy una necesidad, existe una gran dependencia de los sistemas computarizados y es mayor la delegación de tareas en ellos. Los patrones se han utilizado con éxito en la reducción de tiempo de desarrollo y el número de errores en el desarrollo de software, además de ser una guía para resolver un problema típico. En este trabajo se presentan dos patrones de implementación para incorporar proactividad en un software y facilitar el trabajo con los agentes. Se incluye un caso de estudio del uso de los patrones propuestos en un observatorio tecnológico.<hr/>Object oriented programming is facing challenges such as the development of software in distributed environments. Along this line has emerged the paradigm of agents. An agent shows behaviors, such as autonomy and proactivity, that differentiates it from an object. Proactivity allows developing goal-directed systems, in which a request is not necessary to start a task. Adding proactivity to a software is nowadays essential, there is a big dependence on computer systems and it is greater the delegation of tasks to them. The patterns have been used successfully in reducing development time and the number of errors in software, besides of being a guide to solve a typical problem. In this paper, we present two implementation patterns to add proactivity to software and to make it easier to work with agents. A case study about the development of a technology observatory using both patterns is also included. <![CDATA[<b>Redes neuronales dinámicas aplicadas a la recomendación musical optimizada</b>]]> En este trabajo se presenta un método basado en la operación de las llamadas redes neuronales dinámicas (RND), para la recomendación musical optimizada. Las redes son entrenadas con las señales de cada melodía, y no con descriptores tradicionales. La propuesta fue probada con una base de datos compuesta por 1,000 melodías, a diferentes frecuencias de muestreo.<hr/>A method based on the operation of so called dynamic neural networks (DNN) for music recommendation is described. DNNs are trained with the signals of each melody and not with traditional descriptors. The method has been tested with a database composed of 1.200 melodies, at different sampling frequencies.