A Consensus algorithm for approximate pattern matching in protein sequences

Alba, A.; Rubio-Rincón, M.; Rodríguez-Kessler, M.; Arce-Santana, E.R.; Méndez, M.O.

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Permalink

Revista mexicana de ingeniería biomédica

versión On-line ISSN 2395-9126versión impresa ISSN 0188-9532

Resumen

ALBA, A. et al. A Consensus algorithm for approximate pattern matching in protein sequences. Rev. mex. ing. bioméd [online]. 2012, vol.33, n.2, pp.87-99. ISSN 2395-9126.

In bioinformatics, one of the main tools which allow scientists to find common characteristics in protein or DNA sequences of different species is the approximate matching of strings. From the computational point of view, the difficulty of approximate string matching lies in finding adequate measures to efficiently compare two strings, since, in many cases, one is interested in performing searches in real time, within large databases. In this paper we propose a novel method for approximate string matching based on a generalization of the algorithm proposed by Baeza-Yates and Perleberg in 1996 for computing the Hamming distance between two sequences. In addition, a post-processing stage which significantly reduces the number of false positives is presented. The proposed method has been evaluated in synthetic cases of random sequences, and with real cases of plant protein sequences. Results show that the proposed algorithm is highly efficient in computational terms and in specificity, especially when compared against a previously published method, which is based on the phase correlation function.

Palabras llave : approximate string matching; protein sequences; bioinformatics; consensus algorithms.

· resumen en Español · texto en Español · Español (

pdf )