Scielo RSS <![CDATA[Polibits]]> http://www.scielo.org.mx/rss.php?pid=1870-904420130002&lang=pt vol. num. 48 lang. pt <![CDATA[SciELO Logo]]> http://www.scielo.org.mx/img/en/fbpelogp.gif http://www.scielo.org.mx <![CDATA[<b>Editorial</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200001&lng=pt&nrm=iso&tlng=pt <![CDATA[<b>Uncertainty Levels of Second-Order Probability</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200002&lng=pt&nrm=iso&tlng=pt Since second-order probability distributions assign probabilities to probabilities there is uncertainty on two levels. Although different types of uncertainty have been distinguished before and corresponding measures suggested, the distinction made here between first- and second-order levels of uncertainty has not been considered before. In this paper previously existing measures are considered from the perspective of first- and second-order uncertainty and new measures are introduced. We conclude that the concepts of uncertainty and informativeness needs to be qualified if used in a second-order probability context and suggest that from a certain point of view information can not be minimized, just shifted from one level to another. <![CDATA[<b>Triangle-Triangle Intersection Determination and Classification to Support Qualitative</b> <b>Spatial Reasoning</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200003&lng=pt&nrm=iso&tlng=pt In CAD/CAM modeling, objects are represented using the Boundary Representation (ANSI Brep) model Detection of possible intersection between objects can be based on the objects' boundaries (ie., triangulated surfaces), and computed using triangle-triangle intersection. Usually only a cross intersection algorithm is needed; however, it is beneficial to have a single robust and fast intersection detection algorithm for both cross and coplanar intersections. For qualitative spatial reasoning, a general-purpose algorithm is desirable for accurately differentiating the relations in a region connection calculus, a task that requires consideration of intersection between objects. Herein we present a complete uniform integrated algorithm for both cross and coplanar intersection. Additionally, we present parametric methods for classifying and computing intersection points. This work is applicable to most region connection calculi, particularly VRCC-3D+, which detects intersections between 3D objects as well as their projections in 2D that are essential for occlusion detection. <![CDATA[<b>ST Algorithm for Medical Diagnostic Reasoning</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200004&lng=pt&nrm=iso&tlng=pt The authors have previously described an approach for medical diagnostic reasoning based on the ST (Select and Test) model introduced by Ramoni and Stefanelli et al. This paper extends the previous approach by introducing the required algorithm for medical expert system development. The algorithm involves a bottom-up and recursive process using logical inferences, abduction, deduction, and induction. Pseudocode for the algorithm, and the data structures involved, are described, and the algorithm's implementation using a small sample knowledgebase and programmed in Java is included in appendixes. Implementation of a successful expert system is a challenging process; development of the necessary algorithm for its inference engine, and definition of a knowledgebase structure that models expert diagnostic reasoning and knowledge, only fulfils the initial step. Challenges associated with the remaining steps of the development process can be identified and dealt with using the CLAP software process model. <![CDATA[<b>A Logic Programming Approach to the Conservation of Buildings Based on an Extension of the Eindhoven Classification Model</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200005&lng=pt&nrm=iso&tlng=pt The identification, classification and recording of events that may lead to the deterioration of buildings are crucial for the development of appropriate repair strategies. This work presents an extension of the Eindhoven Classification Model to sort adverse events root causes for Building Conservation. Logic Programming was used for knowledge representation and reasoning, letting the modelling of the universe of discourse in terms of defective data, information and knowledge. Indeed, a systematization of the evolution process of the body of knowledge in terms of a new factor, the Quality of Information one, embedded in the Root Cause Analysis was accomplished, i.e., the system proposed led to a process of Quality of Information quantification that permit the study of the event's root causes, on time. <![CDATA[<b>Merging Deductive and Abductive Knowledge Bases</b>: <b>An Argumentation Context Approach</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200006&lng=pt&nrm=iso&tlng=pt The consideration of heterogenous knowledge sources for supporting decision making is key to accomplish informed decisions, e.g., about medical diagnosis. Consequently, merging different data from different knowledge bases is a key issue for providing support for decision-making. In this paper, we explore an argumentation context approach, which follows how medical professionals typically reason, in order to merge two basic kinds of reasoning approaches based on logic programs: deductive and abductive inferences. In this setting, we introduce two kinds of argumentation frameworks: deductive argumentation frameworks and abductive argumentation frameworks. For merging these argumentation frameworks, we follow an approach based on argumentation context systems. We illustrate the approach by considering two different declarative specifications of evidence-based medical knowledge into logic programs in order to support informed medical decisions <![CDATA[<b>Multiscale RBF Neural Network for Forecasting of Monthly Hake Catches off Southern Chile</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200007&lng=pt&nrm=iso&tlng=pt We present a forecasting strategy based on stationary wavelet transform combined with radial basis function (RBF) neural network to improve the accuracy of 3-month-ahead hake catches forecasting of the fisheries industry in the central southern Chile. The general idea of the proposed forecasting model is to decompose the raw data set into an annual cycle component and an inter-annual component by using 3-levels stationary wavelet decomposition. The components are independently predicted using an autoregressive RBF neural network model. The RBF neural network model is composed of linear and nonlinear weights, which are estimates using the separable nonlinear least squares method. Consequently, the proposed forecaster is the co-addition of two predicted components. We demonstrate the utility of the proposed model on hake catches data set for monthly periods from 1963 to 2008. Experimental results on hake catches data show that the autoregressive RBF neural network model is effective for 3-month-ahead forecasting. <![CDATA[<b>Supply Chain Management by Means of Simulation</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200008&lng=pt&nrm=iso&tlng=pt Several changes in the macro environment of the companies over the last two decades have meant that the competition is no longer constrained to the product itself, but the overall concept of supply chain. Under these circumstances, the supply chain management stands as a major concern for companies nowadays. One of the prime goals to be achieved is the reduction of the Bullwhip Effect, related to the amplification of the demand supported by the different levels, as they are further away from customer. It is a major cause of inefficiency in the supply chain. Thus, this paper presents the application of simulation techniques to the study of the Bullwhip Effect in comparison to modern alternatives such as the representation of the supply chain as a network of intelligent agents. We conclude that the supply chain simulation is a particularly interesting tool for performing sensitivity analyses in order to measure the impact of changes in a quantitative parameter on the generated Bullwhip Effect. By way of example, a sensitivity analysis for safety stock has been performed to assess the relationship between Bullwhip Effect and safety stock. <![CDATA[<b>A POS Tagger for Social Media Texts Trained on Web Comments</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200009&lng=pt&nrm=iso&tlng=pt Using social media tools such as blogs and forums have become more and more popular in recent years. Hence, a huge collection of social media texts from different communities is available for accessing user opinions, e.g., for marketing studies or acceptance research. Typically, methods from Natural Language Processing are applied to social media texts to automatically recognize user opinions. A fundamental component of the linguistic pipeline in Natural Language Processing is Part-of-Speech tagging. Most state-of-the-art Part-of-Speech taggers are trained on newspaper corpora, which differ in many ways from non-standardized social media text. Hence, applying common taggers to such texts results in performance degradation. In this paper, we present extensions to a basic Markov model tagger for the annotation of social media texts. Considering the German standard Stuttgart/Tübinger TagSet (STTS), we distinguish 54 tag classes. Applying our approach improves the tagging accuracy for social media texts considerably, when we train our model on a combination of annotated texts from newspapers and Web comments. <![CDATA[<b>Non-continuous Syntactic N-grams</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200010&lng=pt&nrm=iso&tlng=pt En este artículo presentamos el concepto de los n-gramas sintácticos no-continuos. En nuestros trabajos previos hemos introducido un concepto general de los n-gramas sintácticos, es decir, los n-gramas que se construyen siguiendo las rutas en un árbol sintáctico. Su gran ventaja consiste en que permiten introducir información puramente lingüística (sintáctica) en los métodos computacionales de aprendizaje automático. Su desventaja está relacionada con la necesidad de realizar el análisis sintáctico automático previo. También hemos demostrado que la aplicación de los n-gramas sintácticos en la tarea de atribución de autoría da mejores resultados que el uso de los n-gramas tradicionales. Sin embargo, en dichos trabajos sólo hemos considerado los n-gramas sintácticos continuos, es decir, durante su construcción no se permiten bifurcaciones en las rutas sintácticas. En este artículo, estamos proponiendo a quitar esta limitación, y de esa manera considerar todos los sub-árboles de longitud n de un árbol sintáctico como los n-gramas sintácticos no-continuos. Cabe mencionar que los n-gramas sintácticos continuos son un caso particular de los n-gramas sintácticos no-continuos. El trabajo futuro debe mostrar qué tipo de n-gramas es más útil y para qué tareas de PLN. Se propone la manera formal de escribir un n-grama sintáctico no-continuo usando paréntesis y comas, por ejemplo, "a b [c [d, e], f]". También presentamos en este artículo ejemplos de construcción de n-gramas sintácticos no-continuos para los árboles sintácticos obtenidos usando FreeLing y el parser de Stanford.<hr/>In this paper, we present the concept of non-continuous syntactic n-grams. In our previous works we introduced the general concept of syntactic n-grams, i.e., n-grams that are constructed by following paths in syntactic trees. Their great advantage is that they allow introducing of the merely linguistic (syntactic) information into machine learning methods. Certain disadvantage is that previous parsing is required. We also proved that their application in the authorship attribution task gives better results than using traditional n-grams. Still, in those works we considered only continuous syntactic n-grams, i.e., the paths in syntactic trees are not allowed to have bifurcations. In this paper, we propose to remove this limitation, so we consider all sub-trees of length « of a syntactic tree as non-continuous syntactic n-grams. Note that continuous syntactic n-grams are the particular case of non-continuous syntactic n-grams. Further research should show which n-grams are more useful and in which NLP tasks. We also propose a formal manner of writing down (representing) non-continuous syntactic n-grams using parenthesis and commas, for example, "a b [c [d, e], f]. In this paper, we also present examples of construction of non-continuous syntactic n-grams on the basis of the syntactic tree of the FreeLing and the Stanford parser. <![CDATA[<b>More Effective Boilerplate Removal-the GoldMiner Algorithm</b>]]> http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442013000200011&lng=pt&nrm=iso&tlng=pt The ever-increasing web is an important source for building large-scale corpora. However, dynamically generated web pages often contain much irrelevant and duplicated text, which impairs the quality of the corpus. To ensure the high quality of web-based corpora, a good boilerplate removal algorithm is needed to extract only the relevant content from web pages. In this article, we present an automatic text extraction procedure, GoldMiner, which by enhancing a previously published boilerplate removal algorithm, minimizes the occurrence of irrelevant duplicated content in corpora, and keeps the text more coherent than previous tools. The algorithm exploits similarities in the HTML structure of pages coming from the same domain. A new evaluation document set (CleanPortalEval) is also presented, which can demonstrate the power of boilerplate removal algorithms for web portal pages.