<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0188-9532</journal-id>
<journal-title><![CDATA[Revista mexicana de ingeniería biomédica]]></journal-title>
<abbrev-journal-title><![CDATA[Rev. mex. ing. bioméd]]></abbrev-journal-title>
<issn>0188-9532</issn>
<publisher>
<publisher-name><![CDATA[Sociedad Mexicana de Ingeniería Biomédica]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0188-95322011000100005</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[A multiple-filter-GA-SVM method for dimension reduction and classification of DNA-microarray data]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Hernández Montiel]]></surname>
<given-names><![CDATA[L.A.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Bonilla Huerta]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Morales Caporal]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Instituto Tecnológico de Apizaco Laboratorio de Investigación en Tecnologías Inteligentes ]]></institution>
<addr-line><![CDATA[Apizaco Tlaxcala]]></addr-line>
<country>México</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>07</month>
<year>2011</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>07</month>
<year>2011</year>
</pub-date>
<volume>32</volume>
<numero>1</numero>
<fpage>32</fpage>
<lpage>39</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S0188-95322011000100005&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S0188-95322011000100005&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S0188-95322011000100005&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[The following article proposes a Multiple-Filter by using a genetic algorithm (GA) combined with a support vector machine (SVM) for gene selection and classification of DNA microarray data. The proposed method is designed to select a subset of relevant genes that classify the DNA-microarray data more accurately. First, three traditional statistical methods are used for gene selection. Then different relevant gene subsets are selected by using a GA/SVM framework using leave-one-out cross validation (LOOCV) to avoid data overfitting. A gene subset (niche), consisting of relevant genes, is obtained from each statistical method, by analyzing the frequency of each gene in the different gene subsets. Finally, the most frequent genes contained in the niche, are evaluated by the GA/ SVM to obtain a final relevant gene subset. The proposed method is tested in two DNA-microarray datasets: Leukemia and colon. In the experimental results it is observed that the Multiple-Filter-GA-SVM (MF-GA-SVM) work very well by achieving lower classification error rates using a smaller number of selected genes than other methods reported in the literature.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[El presente trabajo propone un múltiple-filtro utilizando un algoritmo genético (AG) combinado con una máquina de soporte vectorial (MSV) para la selección de genes y la clasificación de datos obtenidos de micro-arreglos de ADN. El método propuesto es diseñado para seleccionar un sub-conjunto de genes pertinentes que clasifiquen los datos obtenidos de micro-arreglos de ADN más eficientemente. Primero, tres métodos estadísticos tradicionales son usados para la selección de genes. Luego, diferentes sub-conjuntos de genes pertinentes son seleccionados por medio de una estructura AG-MSV utilizando la técnica deja uno fuera de validación cruzada (DUFVC) para evitar el sobre-entrenamiento de los datos. Un sub-conjunto de genes (nicho), que consiste de genes pertinentes, es obtenido de cada método estadístico, al cual analiza la frecuencia de cada gen en diferentes sub-conjuntos de genes. Finalmente, los genes más frecuentes contenidos en el nicho son nuevamente evaluados por la estructura AG-MSV para obtener a sub-conjunto final de genes pertinentes. El método propuesto es evaluado en dos bases de micro-arreglos: Leucemia y colon. En los resultados experimentales se observa que el múltiple-filtro-AG-MSV trabajo muy bien logrando bajas tasas de error en la clasificación usando un número pequeño de genes más que otros métodos reportados en la literatura.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[DNA-microarrays]]></kwd>
<kwd lng="en"><![CDATA[filters]]></kwd>
<kwd lng="en"><![CDATA[wrappers]]></kwd>
<kwd lng="en"><![CDATA[genetic algorithm]]></kwd>
<kwd lng="en"><![CDATA[support vector machine]]></kwd>
<kwd lng="en"><![CDATA[gene selection]]></kwd>
<kwd lng="es"><![CDATA[Micro-arreglos de ADN]]></kwd>
<kwd lng="es"><![CDATA[filtros]]></kwd>
<kwd lng="es"><![CDATA[envoltorios]]></kwd>
<kwd lng="es"><![CDATA[algoritmos genéticos]]></kwd>
<kwd lng="es"><![CDATA[máquina de soporte vectorial]]></kwd>
<kwd lng="es"><![CDATA[selección de genes]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[  	    <p align="justify"><font face="verdana" size="4">Art&iacute;culo de investigaci&oacute;n original</font></p>     <p align="justify">&nbsp;</p>      <p align="center"><font face="verdana" size="4"><b>A multiple&#45;filter&#45;GA&#45;SVM method for dimension reduction and classification of DNA&#45;microarray data</b></font></p>  	    <p align="center"><b><font face="verdana" size="2">Hern&aacute;ndez Montiel L.A.*, Bonilla Huerta E.*, Morales Caporal R.*</font></b><font face="verdana" size="2"></font></p> 	    <p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><i>* Laboratorio de Investigaci&oacute;n en Tecnolog&iacute;as Inteligentes. Instituto Tecnol&oacute;gico de Apizaco.</i></font></p>  	    <p align="justify">&nbsp;</p> 	    <p align="justify"><font face="verdana" size="2"><b>Correspondence:</b>    <br> 	</font><font face="verdana" size="2">L.A. Hern&aacute;ndez Montiel.    ]]></body>
<body><![CDATA[<br> 	</font><font face="verdana" size="2">Av. Instituto Tecnol&oacute;gico s/n.    <br> 	</font><font face="verdana" size="2">90300. Apizaco, Tlaxcala, M&eacute;xico.    <br> 	</font><font face="verdana" size="2">{edbonn, luisahm, morales&#45;caporal}@ita&#45;</font><font face="verdana" size="2">pizaco.edu.mx</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="verdana" size="2">Received article: 18/march/2011.     <br> </font><font face="verdana" size="2">Accepted article: 30/may/2011.</font></p>     <p align="justify">&nbsp;</p>  	    <p align="justify"><font face="verdana" size="2"><b>ABSTRACT</b></font></p>  	    <p align="justify"><font face="verdana" size="2">The following article proposes a Multiple&#45;Filter by using a genetic algorithm (GA) combined with a support vector machine (SVM) for gene selection and classification of DNA microarray data. The proposed method is designed to select a subset of relevant genes that classify the DNA&#45;microarray data more accurately. First, three traditional statistical methods are used for gene selection. Then different relevant gene subsets are selected by using a GA/SVM framework using leave&#45;one&#45;out cross validation (LOOCV) to avoid data overfitting. A gene subset (niche), consisting of relevant genes, is obtained from each statistical method, by analyzing the frequency of each gene in the different gene subsets. Finally, the most frequent genes contained in the niche, are evaluated by the GA/ SVM to obtain a final relevant gene subset. The proposed method is tested in two DNA&#45;microarray datasets: Leukemia and colon. In the experimental results it is observed that the Multiple&#45;Filter&#45;GA&#45;SVM (MF&#45;GA&#45;SVM) work very well by achieving lower classification error rates using a smaller number of selected genes than other methods reported in the literature.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Key Words:</b> DNA&#45;microarrays, filters, wrappers, genetic algorithm, support vector machine, gene selection.</font></p>     ]]></body>
<body><![CDATA[<p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>RESUMEN</b></font></p>  	    <p align="justify"><font face="verdana" size="2">El presente trabajo propone un m&uacute;ltiple&#45;filtro utilizando un algoritmo gen&eacute;tico (AG) combinado con una m&aacute;quina de soporte vectorial (MSV) para la selecci&oacute;n de genes y la clasificaci&oacute;n de datos obtenidos de micro&#45;arreglos de ADN. El m&eacute;todo propuesto es dise&ntilde;ado para seleccionar un sub&#45;conjunto de genes pertinentes que clasifiquen los datos obtenidos de micro&#45;arreglos de ADN m&aacute;s eficientemente. Primero, tres m&eacute;todos estad&iacute;sticos tradicionales son usados para la selecci&oacute;n de genes. Luego, diferentes sub&#45;conjuntos de genes pertinentes son seleccionado spor medio de una estructura AG&#45;MSV utilizando la t&eacute;cnica deja uno fuera de validaci&oacute;n cruzada (DUFVC) para evitar el sobre&#45;entrenamiento de los datos. Un sub&#45;conjunto de genes (nicho), que consiste de genes pertinentes, es obtenido de cada m&eacute;todo estad&iacute;stico, al cual analiza la frecuencia de cada gen en diferentes sub&#45;conjuntos de genes. Finalmente, los genes m&aacute;s frecuentes contenidos en el nicho son nuevamente evaluados por la estructura AG&#45;MSV para obtener a sub&#45;conjunto final de genes pertinentes. El m&eacute;todo propuesto es evaluado en dos bases de micro&#45;arreglos: Leucemia y colon. En los resultados experimentales se observa que el m&uacute;ltiple&#45;filtro&#45;AG&#45;MSV trabajo muy bien logrando bajas tasas de error en la </font><font face="verdana" size="2">clasificaci&oacute;n usando un n&uacute;mero peque&ntilde;o de genes m&aacute;s que otros m&eacute;todos reportados en la literatura.</font></p>     <p align="justify"><font face="verdana" size="2"><b>Palabras clave:</b> Micro&#45;arreglos de ADN, filtros, envoltorios, algoritmos gen&eacute;ticos, m&aacute;quina de soporte vectorial, selecci&oacute;n de genes.</font></p>     <p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>INTRODUCTION</b></font></p>  	    <p align="justify"><font face="verdana" size="2">DNA microarray technology allows to measure simultaneous the activity of tens of thousands of genes in a cell mixture. A great number of classification methods have been proposed for analyzing microarray data<sup>1&#45;5</sup>. In order to extract useful gene information from cancer microarray data and reduce dimensionality, we propose a hybrid model that combines a genetic algorithm (GA) for gene selection, and a support vector machine (SVM) for classification. We propose this model to find subset of genes with higher classification accuracy in two microarray datasets: Leukemia and Colon. This paper is organized as follows. An introduction of microarray technology is shown in section 2. In section 3 some preliminaries on statistical methods (filters) is presented. In section 4 the MRGASVM model is described. In section 4, a detailed description of Genetic Algorithms and Support Vector Machines are given. Section 5 provides an analysis of the experimental results and finally conclusions are drawn in section 6.</font></p>     <p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>DNA MICROARRAY TECHNOLOGY</b></font></p>  	    <p align="justify"><font face="verdana" size="2">DNA microarray technology was first published in 1995 by M. Schena et al<sup>6</sup>. Typically a microarray (sometimes called DNA chip) is a glass or plastic slide, on to which DNA molecules are attached at fixed spots. There tens of thousands of spots on an array. For gene expression studies, each spots ideally should identify one gene in the genome. Microarray technology allows biologists and researchers to measure the expression of thousands of genes simultaneously on a single chip.</font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">This technology is based on the process of hybridization. The chip is arrangedinaregulargrid&#45;like pattern and segments of DNA strands are either deposited within individual grids. <a href="#f1">Figure 1</a> shows the basic principles of DNA microarray experiment. The procedure of a DNA microarray experiment includes several steps from sample preparation to data analysis.</font></p> 	    <p align="center"><a name="f1"></a><img src="../img/revistas/rmib/v32n1/a5f1.jpg"></p>  	    <p align="justify"><font face="verdana" size="2">DNA (cDNA) chip for hybridization. Resulting chip is then scanned and processed to produce a two dimensional numerical array of microarray gene expression data that is used by data analysis algorithms.</font></p>     <p align="justify"><font face="verdana" size="2">A microarray experiment involves three basic steps: 1) sample preparation and labeling, 2) hybridization and washing and 3) microarray image scanning and processing.</font></p>  	    <p align="justify"><font face="verdana" size="2">In first step, a microarray experiment involves sample preparation and labeling. DNA or RNA is isolated from both samples (Normal/tumor, Treated cells/control cells), transformed and labeled with fluorophores.</font></p>  	    <p align="justify"><font face="verdana" size="2">Hybridization and washing are the second step involve into a microarray experiment. The Hybridization is the process of joining two complementary strands of DNA to form a double helix molecule. The labeled cDNA are mixed together to the slice at a specific temperature to allow complementary sequences to anneal. Finally, the slices are washed to remove contaminants.</font></p>  	    <p align="justify"><font face="verdana" size="2">After the third step (scanning and processing), microarray experiment produces a two dimensional array of numbers. Columns indicate genes and rows indicate samples as shown in <a href="#f2">Figure 2</a>. Each column is the expression levels of all genes of one sample in the microarray experiment. Each row is the expression levels of one gene across different sample tissues.</font></p> 	    <p align="center"><a name="f2"></a><img src="../img/revistas/rmib/v32n1/a5f2.jpg"></p>     <p align="justify"><font face="verdana" size="2">One particularity of microarray data is their great number of attributes (genes) whereas very few samples are available. This dimensionality problem makes the data difficult to understand and reduces the efficiency of classification algorithms. One of the key tasks of microarray data is to perform classification through different expression profiles. For a more detailed description of microarray gene expression experiment, please refer to<sup>7&#45;9</sup>.</font></p>     <p align="justify">&nbsp;</p>      ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2"><b>METHODOLOGY</b></font></p>  	    <p align="justify"><font face="verdana" size="2">This work proposes a new method to reduce the initial dimension of microarray datasets and to select relevant gene subset for classification. First, three statistical filters (BSS/WSS, Wilcoxon test and T&#45;statistic) are proposed to filter relevant genes. Then the problem of gene selection is treated by a GA&#45;SVM approach that selects a relevant gene subset for SVM classifier.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Pre&#45;processing by min&#45;max normalization</b></font></p>  	    <p align="justify"><font face="verdana" size="2">The pre&#45;processing procedureisaveryimportant task in gene selection and classification. In this process the noisy, irrelevant and inconsistent data are been eliminated. We normalize the gene expression levels of each dataset into interval &#91;0,1&#93; using the minimum and maximum expression values of each gene. Due to the small size of training set for leukemia and tumor colon dataset, leave&#45;one&#45;out cross validation (LOOCV) is utilized to select the training and testing set respectively.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Data filtering</b></font></p>  	    <p align="justify"><font face="verdana" size="2">Filters or filtering techniques reduce the dimension of a dataset and to filter the most relevant or informative genes to enhance the generalization performance. In this work three types of filters are proposed to make the reduction of the databases of colon cancer<sup>1</sup> and leukemia<sup>3</sup>. The three filters are BSS/WSS, Wilcoxon test and T&#45;Statistic test (described below).</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>A) BSS/WSS</b></font></p>  	    <p align="justify"><font face="verdana" size="2">We use the gene selection filter proposed by Dudoit<sup>10</sup>, namely the ratio of the sums of squares between groups (Between Sum Square&#45;BSS) and within groups (Within Sum Square&#45;WSS). This ratio compares the distance of the center of each class to the over&#45;all center to the distance of each gene to its class. The equation for a given gene <i>j</i> has the form:</font></p>  	    <p align="center"><img src="../img/revistas/rmib/v32n1/a5e1.jpg"></p>     <p align="justify"><font face="verdana" size="2">Where y denotes the subclass label of gene i,<b><i> <img src="../img/revistas/rmib/v32n1/a5i1.jpg"></i></b> denotes the average expression level of gene j across all samples and<i> <img src="../img/revistas/rmib/v32n1/a5i2.jpg"></i> denotes the average expression level of gene j belonging to subclass k = 1 and k = 2. In this work the top&#45;p genes ranked by BSS/WSS are selected.</font></p>      ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2"><b>B) Wilcoxon rank sum test</b></font></p>  	    <p align="justify"><font face="verdana" size="2">Wilcoxon rank sum test (denoted W) is a non&#45;parametric criterion used for feature selection. This filter is the sum of ranks of the samples in the smaller class. The main steps of this filters are defined as follows<sup>11</sup>:</font></p>  	    <blockquote> 	      <p align="justify"><font face="verdana" size="2">1.&nbsp;Combine all observations from the two populations and rank them in value ascending order. If some observation have tied values, is assigned each observation in a tie their average rank.</font></p> 	      <p align="justify"><font face="verdana" size="2">2.&nbsp;Add all the ranks associated with the observations<b> </b>from the smaller group. This gives W.</font></p> 	      <p align="justify"><font face="verdana" size="2">3.&nbsp;The p&#45;value associated with the Wilcoxon statistic is found from the Wilcoxon rank sum distribution table. In this case this statistic is obtained from Matlab.</font></p> </blockquote>      <p align="justify"><font face="verdana" size="2"><b>C) T&#45;Statistic</b></font></p>  	    <p align="justify"><font face="verdana" size="2">The standard t&#45;statistic is the most extensively used criterion proposed to identify differentially expressed </font><font face="verdana" size="2">genes. Each sample is labeled into interval {1, &#45;1}. For each feature <i>f<sub>j</sub></i>, the mean is <i>&micro;</i><sup>1</sup><sub>j</sub> and <i>&micro;</i><sup>&#45;1</sup><sub>j</sub><i>,</i> standard deviation <i>&delta;<sup>1</sup><sub>i</sub></i> and <i>&delta;<sup>&#45;1</sup><sub>i</sub></i> are calculated using only the samples labeled 1 and &#45;1 respectively. Then a score <i>Tf<sub>j</sub></i> can be obtained by<sup>9</sup>:</font></p>     <p align="center"><img src="../img/revistas/rmib/v32n1/a5e2.jpg"></p>      <p align="justify"><font face="verdana" size="2">Where and <i>n<sub>2</sub></i> are the number of samples labeled as 1 and &#45;1 respectively. Large absolute t&#45;statistic indicates the most discriminatory features (genes).</font></p>     ]]></body>
<body><![CDATA[<p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>GENERAL MODEL MFGASVM FOR GENE SELECTION AND CLASSIFICATION</b></font></p>  	    <p align="justify"><font face="verdana" size="2">In this section, we introduce the proposed a Multiple&#45;Filter&#45;Wrapper for gene selection and classification of DNA&#45;microarray datasets which is depicted in <a href="#f2">Figure 2</a>. In the first step, a statistic filtering/ranking method is applied to rank genes. That is means that each gene is evaluated and ranked according a statistical filter. Three filters are proposed in this work (Wilcoxon test, BSS/WSS and t&#45;statistical). Thus, the first <i>p</i> (50) genes with the highest top ranking score are selected. In second step, for each <i>p</i> selected genes, a selective gene selection is performed by using a GA/SVM method (details of the GA/SVM have been described in section 6). For each filter, is executed the GA/SVM method, thus gene subsets having a high performance given by the SVM classifier are conserved into a niche.</font></p>  	    <p align="justify"><font face="verdana" size="2">After, from niche was selected the genes having the highest frequency and a new execution of the GA/SVM method is realized to obtain a final gene subset (<a href="#f3">Figure 3</a>). Multiple&#45;Filter&#45;GA&#45;SVM is proposed to reduce the dimensionality of DNA&#45;microarray datasets, to improve the classification accuracy by using a multiple&#45;filter&#45;GA/SVM and to obtain a good candidate gene subset for classification.</font></p> 	    <p align="center"><a name="f3"></a><img src="../img/revistas/rmib/v32n1/a5f3.jpg"></p>     <p align="center">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>GENE SELECTION AND CLASSIFICATION BY USING A GA&#45;SVM METHOD</b></font></p>  	    <p align="justify"><font face="verdana" size="2">In this study a multiple&#45;filter&#45;GA/SVM method is proposed to handle gene selection and classification of DNA&#45;microarray datasets. The selection of genes of the databases is achieved by a genetic algorithm </font><font face="verdana" size="2">(GA), which was created to select the best gene that exists within the base. For the classification of genes we use a Support Vector Machine (SVM). This classifier is used to evaluate the best gene in their fitness function.</font></p>     <p align="justify"><font face="verdana" size="2"><b>Genetic algorithm</b></font></p>  	    <p align="justify"><font face="verdana" size="2">Genetic algorithms (GA's) are adaptive methods that can be used to solve optimization problems<sup>12</sup>. GA's are stochastic search algorithms based in the process of natural selection. GA's evolves a population of individuals where each individual represents a candidate solution for a given problem. A fitness function is defined to evaluate the quality of each candidate solution. Finally genetic operators are specified in this evolutionary process<sup>13</sup>.</font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">In this study, to obtain a new population from the current population P we apply genetic operators as follows: a) selecting two parents and implement (with a given probability) the crossover to create two new solutions and they are muted (with a given probability), and b) replace parents by their descendants (offspring). These two actions are repeated for a predetermined number of times (number of generations). Finally, the elite chromosomes (with a given probability) are copied into population P to replace the worst chromosomes. At this point, a generation is accomplished. <a href="#f4">Figure 4</a> shows the overall operations for the AG.</font></p> 	    <p align="center"><a name="f4"></a><img src="../img/revistas/rmib/v32n1/a5f4.jpg"></p>      <p align="justify"><font face="verdana" size="2"><i>Chromosome representation and population initialization</i></font></p>  	    <p align="justify"><font face="verdana" size="2">A chromosome is used to represent a candidate gene subset. A chromosome is a binary string of length equal to the number of selected genes obtained by the filter method. Thus, each bit encodes a single gene. If a bit is '1' means that this gene is stored in the gene subset whether it is a '0' indicates that the gene is excluded from gene subset. The length of the chromosome is denoted in this article as I. The initial population is generated randomly following a uniform distribution. <a href="#f5">Figure 5</a> shows the representation of a chromosome.</font></p> 	    <p align="center"><a name="f5"></a><img src="../img/revistas/rmib/v32n1/a5f5.jpg"></p>     <p align="justify"><font face="verdana" size="2"><i>The fitness function</i></font></p>  	    <p align="justify"><font face="verdana" size="2">The fitness function of a chromosome affects directly the performance of the GA. In this case, a SVM classifier provides the accuracy classification of each chromosome as follows:</font></p>  	    <p align="center"><img src="../img/revistas/rmib/v32n1/a5e3.jpg"></p>     <p align="justify"><font face="verdana" size="2">Where x is a candidate subset and <i>accuracy<sub>SVM</sub></i> is the classification accuracy that SVM built on x.</font></p>  	    <p align="justify"><font face="verdana" size="2"><i>Selection, crossover, mutation, replacement and stopping criteria</i></font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">In this work, a selection mechanism based on the roulette wheel is proposed. For crossover operator, is used the multi&#45;uniform point crossover (Pc). For mutation, each chromosome has a low probability to mutate (Pm). A mechanism of elitism is also applied to conserve the top 10 or 15% of the population. Finally the stopping criteria, is a predefined number of generations.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Support vector machine (SVM)</b></font></p>  	    <p align="justify"><font face="verdana" size="2">SVM is a powerful data mining technique developed by Vapnik in the mid&#45;1960s. It has been applied for many applications in classification and regression<sup>14,15</sup>. Recently SVM have been successfully applied in diverse fields of applicationsuch asfraud detection, direct marketing, text mining and recently to deal with high&#45;dimensional data such as gene expression in bioinformatics.</font></p>  	    <p align="justify"><font face="verdana" size="2">The mathematical basis for SVM is derived from statistical learning theory. The training set is supposed to be a finite set of N data/class pairs defined as follows:</font></p>  	    <p align="center"><img src="../img/revistas/rmib/v32n1/a5e4.jpg"></p> 	    <p align="center"><img src="../img/revistas/rmib/v32n1/a5f6.jpg"></p>     <p align="justify"><font face="verdana" size="2">Where<i> <img src="../img/revistas/rmib/v32n1/a5i3.jpg"></i> (data) and <i>y<sub>i</sub></i> &#8712;{&plusmn;1}(classes). The SVM projects <i>x</i> to <i>z = &phi;(x)</i> in a Hilbert Space H by a nonlinear map <img src="../img/revistas/rmib/v32n1/a5i4.jpg"></font></p>      <p align="justify"><font face="verdana" size="2">If we assume that the data are linearly separable, i.e., that there exist <img src="../img/revistas/rmib/v32n1/a5i5.jpg">such that:</font></p>      <p align="center"><img src="../img/revistas/rmib/v32n1/a5e5.jpg"></p>     <p align="justify"><font face="verdana" size="2">For a given linear classifier <img src="../img/revistas/rmib/v32n1/a5i6.jpg"><i>,</i> consider the hyperplane defined by the values &#45;1 and +1 of the decision function:</font></p>      ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">Indeed, the points<i> <img src="../img/revistas/rmib/v32n1/a5i7.jpg"></i>satisfy the follow condition:</font></p>      <p align="center"><img src="../img/revistas/rmib/v32n1/a5e6.jpg"></p>     <p align="justify"><font face="verdana" size="2">By subtracting we get: <i><img src="../img/revistas/rmib/v32n1/a5i8.jpg">,</i> and therefore: </font></p>     <p align="center"><font size="2" face="verdana"><img src="../img/revistas/rmib/v32n1/a5e7.jpg"></font></p>      <p align="justify"><font face="verdana" size="2">Where<b> <img src="../img/revistas/rmib/v32n1/a5i9.jpg"></b> is the margin obtained from the largest separating hyperplane. All training points should be on the right side of the dotted line from the <a href="#f2">Figure 2</a>. From positive examples (<i>y<sub>i</sub></i>=1) this means:<b><i> <img src="../img/revistas/rmib/v32n1/a5i10.jpg"></i></b> and for the negative examples (y<sub>i</sub>=&#45;1)thismeans<b> <img src="../img/revistas/rmib/v32n1/a5i11.jpg"></b> both cases are summarized as follows:</font></p>      <p align="center"><img src="../img/revistas/rmib/v32n1/a5e8.jpg"></p>     <p align="justify"><font face="verdana" size="2">Finally, an optimal separation can be achieved by the hyperplane that has the greatest distance to the neighbouring data points of both classes (optimal hyperplane). For this is necessary to find:<b><i> <img src="../img/revistas/rmib/v32n1/a5i12.jpg"></i></b>which minimize:<b> <img src="../img/revistas/rmib/v32n1/a5i13.jpg"></b>under the constraint:</font></p>      <p align="center"><img src="../img/revistas/rmib/v32n1/a5e9.jpg"></p>     <p align="justify"><font face="verdana" size="2">This problem can be reduced to a quadratic programming problem. More details about how the problem is solved please refer to<sup>15</sup>.</font></p>  	    <p align="justify"><font face="verdana" size="2">In this work a SVM classifier is utilized to assess the quality (accuracy) of a subset of genes. To avoid the data overfitting SVM error estimation is used by using leave&#45;one&#45;out cross&#45;validation (LOOCV).</font></p>     ]]></body>
<body><![CDATA[<p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>EXPERIMENTAL SETUP AND RESULTS</b></font></p>  	    <p align="justify"><font face="verdana" size="2"><b>A. Datasets used</b></font></p>  	    <p align="justify"><font face="verdana" size="2">In this study, we analyze two well&#45;know public microarray datasets obtained from Affymetrix oligonucleotide microarrays, which are Colon cancer and Leukemia dataset. These two microarray datasets have been widely used as benchmark sets in many supervised learning techniques in bioinformatics.</font></p>  	    <p align="justify"><font face="verdana" size="2">Colon cancer data: This dataset consists of 62 samples (tissues) collected from colon cancer patients (40 tumor samples and 22 normal samples) for 6,500 human genes are measured using the Affymetrix technology. A selection of 2,000 genes with highest minimal intensity across the samples has been made Alon et al<sup>1</sup>. This dataset can be downloaded from the website: <a href="http://genomicspubs.princeton.edu/oncology/affydata/index.html" target="_blank">http://genomicspubs.princeton.edu/oncology/affydata/index.html</a>.</font></p>      <p align="justify"><font face="verdana" size="2">Leukemia dataset: This dataset described by Golub et al<sup>3</sup> is used for classification. The biology task is to identify two types of leukemia: Acute lymphoblastic leukaemia (ALL) and acute myeloid leukaemia (AML). Leukemia data set includes expression levels for 7,129 DNA human genes produced by Affymetrix technology of 72 patients (47 ALL samples and 25 AML samples). Tissues samples were collected at time of diagnosis before treatment, taken either form bone marrow (62 cases), or peripheral blood (10 cases) and reflect both childhood and adult leukemia. As in the original paper the data was divided into a training set of 39 samples (27 are ALL and 11 AML) and a test set of 34 samples (20 ALL and 14 AML). This dataset has been obtained directly from the website: <a href="http://www.broadinstitute.9org/cgibin/cancer/publications/pub_paper.cgi?mode=view&amp;paper_id = 43" target="_blank">www.broadinstitute.9org/cgibin/cancer/publications/pub_paper.cgi?mode=view&paper_id = 43</a>.</font></p>      <p align="justify"><font face="verdana" size="2"><b>B. Genetic algorithm parameters</b></font></p>  	    <p align="justify"><font face="verdana" size="2">The genetic algorithm is implemented in matlab (version 7.6.0) for the SVM used the same toolbox of matlab. The parameters for the genetic algorithm are shown in <a href="#t1">Table 1</a>.</font></p> 	    <p align="center"><font size="2" face="verdana"><a name="t1"></a><img src="../img/revistas/rmib/v32n1/a5t1.jpg"></font></p>     <p align="justify"><font face="verdana" size="2"><b>C. Experimental results</b></font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">In the experimental protocol, DNA&#45;microarray data is obtained through the combination of three filtering methods, which are evaluated separately by the AG/SVM framework. Gene subsets are obtained from each stage are added into a file (niche), to assess their frequency. A new stage is done with the most common genes which are re&#45;evaluated by the GA/ SVM to obtain a final gene subset. The method we use is compared with several results from different works reported in the literature.</font></p>  	    <p align="justify"><font face="verdana" size="2"><a href="#t2">Table 2</a> summarizes the best accuracies obtained by the MF&#45;AGSVM method. The first column indicates a work reported in the literature. Second and Third column shown the accuracy obtained for leukemia and colon cancer dataset respectively. Each cell contains the classification accuracy and the minimal gene subset.</font></p> 	    <p align="center"><a name="t2"></a><img src="../img/revistas/rmib/v32n1/a5t2.jpg"></p>      <p align="justify"><font face="verdana" size="2">The genetic algorithm is run 10 times which gives a yield of 98.61% for leukemia with only 10 genes. In contrast a performance of 98.38% is obtained for colon with 10 genes.</font></p>  	    <p align="justify"><font face="verdana" size="2">Finally, the top five selected genes found for each dataset are shown in <a href="../img/revistas/rmib/v32n1/a5t3.jpg" target="_blank">Tables 3</a> and <a href="../img/revistas/rmib/v32n1/a5t4.jpg" target="_blank">4</a>. For the Leukemia dataset, the top gene is APLP2 Amyloid beta (<a href="../img/revistas/rmib/v32n1/a5t3.jpg" target="_blank">Table 3</a>). In contrast for the colon dataset the most relevant gene is Human desmin gene, complete cds (<a href="../img/revistas/rmib/v32n1/a5t4.jpg" target="_blank">Table 4</a>). It is observed in the two datasets that all genes have been reported in the literature.</font></p> 	    <p align="justify">&nbsp;</p>     <p align="justify"><font face="verdana" size="2"><b>CONCLUSIONS</b></font></p>  	    <p align="justify"><font face="verdana" size="2">In this paper, a multiple&#45;filter&#45;GA/SVM method was presented for selecting a final gene subset with high accuracy classification. Three filtering methods are proposed to make an initial reduction in the size of the database to the wrapper is using a hybrid model based on a genetic algorithm combined with a SVM classifier. The proposed method determines a smaller subset of genes with an accuracy of 98.61 to 98.38% for leukemia and colon respectively. The number of genes found with the proposed model is equal or slightly smaller subsets found in different literatures, which are shown in <a href="#t2">Table 2</a>. The goal is to achieve 100% classification with a smaller set of genes.</font></p>  	    <p align="justify"><font face="verdana" size="2">In this paper, the selectionofagenesubsetfor cancer classification has been done using a Wrapper GA/SVM by using a combination of three feature ranking filters. The databases we use have a very large scale, is why it is impossible to select the best data for evaluation, and also have data with different numerical scales, this problem not generate a good selection of features, therefore we cannot find best genes to be evaluated, the system was tested using two databases, which we showed a great effectiveness in the selection of genes, since the test performed while leave&#45;one&#45;out cross&#45;validation (LOOCV), gives better classification performance of the selected data.</font></p>     <p align="justify"><font face="verdana" size="2">This approach can be further improved on several aspects. One way, involves finding gene subsets with higher classification and a small size. Other way is to include a multiple criteria or multi&#45;objective criteria. In future, it is possible to incorporate more gene selection filters such as: Mutual information or minimal redundancy&#45;maximal relevancy method.</font></p>     ]]></body>
<body><![CDATA[<p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>ACKNOWLEDGEMENTS</b></font></p>  	    <p align="justify"><font face="verdana" size="2">This work is carried out within the PROMEP projet ITAPI&#45;EXB&#45;000.</font></p> 	    <p align="justify">&nbsp;</p>      <p align="justify"><font face="verdana" size="2"><b>REFERENCES</b></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">1.&nbsp;Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: PNAS. USA. National Academy of Sciences. 1999: 6745&#45;6750.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553244&pid=S0188-9532201100010000500001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">2.&nbsp;Ben&#45;Dor L, Bruhn N, Friedman I, Nachman M, Schummer, Yakhini Z. Tissue classification with gene expression profiles. In: RECOMB, Journal of Computational Biology 2000: 54&#45;64.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553246&pid=S0188-9532201100010000500002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">3.&nbsp;Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999; 286: 531&#45;537.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553248&pid=S0188-9532201100010000500003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">4.&nbsp;Alizadeh A, Eisen M, Davis R, Ma C, Lossos I, Rosenwald A et al. Distinct types of diffuse large B&#45;cell lymphoma identified by gene expression profiling. Nature 2000; 403(6769): 503&#45;11.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553250&pid=S0188-9532201100010000500004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">5.&nbsp;Shipp M, Ross K, Tamayo P, Weng A, Kutok J, Aguiar R et al. Diffuse large B&#45;cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nat </font><font face="verdana" size="2">Med 2002: 68&#45;74.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553252&pid=S0188-9532201100010000500005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">6.&nbsp;Schena M, Shalon D, Davis R, Brown P Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467&#45;470.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553254&pid=S0188-9532201100010000500006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">7.&nbsp;Bobashev GV, Das S, Das A. Experimental design for gene microarray experiment and differential expression analyses. Methods of Microarray Data Analysis II 2001: 23&#45;41.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553256&pid=S0188-9532201100010000500007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">8.&nbsp;Geoffrey J, Kim&#45;Anh D, Ambroise C. Analyzing microarray gene expression data. Wiley. 2004.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553258&pid=S0188-9532201100010000500008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">9.&nbsp;Rusell S, Meadows LA, Rusell RR. Microarray technology in practice. Academic Press. First edition. 2009.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553260&pid=S0188-9532201100010000500009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">10.&nbsp;Dudoit S, Fridlyand J, Speed TP Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical, 2002; 97(457): </font><font face="verdana" size="2">77&#45;87.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553262&pid=S0188-9532201100010000500010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">11.&nbsp;Deng L, Pei J, Ma J, Lee DL. Rank sum test method for informative gene discovery. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04), 2004: 410&#45;419.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553264&pid=S0188-9532201100010000500011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">12.&nbsp;Liu H, Li J, Wong L. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 2002; 13: </font><font face="verdana" size="2">51&#45;60.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553266&pid=S0188-9532201100010000500012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <p align="justify"><font face="verdana" size="2">13.&nbsp;Melani M. An introduction to genetic algorithms. MIT Press (Cambridge, Massachusetts &bull; London, England), 1999.</font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2">14.&nbsp;Burges CJC. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 1998; </font><font face="verdana" size="2">2(2): 121&#45;167.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553269&pid=S0188-9532201100010000500014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">15.&nbsp;Joachims T. Making large&#45;scale SVM learning practical. Advances in kernel methods&#45;support vector learning. B. Schokopt et al. (editors), MIT Press, 1999.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553271&pid=S0188-9532201100010000500015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">16.&nbsp;Cho SB, Won HH. Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 2007; 26(3): 243&#45;250.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553273&pid=S0188-9532201100010000500016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">17.&nbsp;Li S, Wu X, Hu X. Gene selection using genetic algorithm and support vectors machines. Soft Comput 2008; 12(7): </font><font face="verdana" size="2">693&#45;698.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553275&pid=S0188-9532201100010000500017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">18.&nbsp;Alba E, Garc&iacute;a&#45;Nieto J, Jourdan L, Talbi EG. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. Congress on Evolutionary Computation 2007: </font><font face="verdana" size="2">284&#45;290.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553277&pid=S0188-9532201100010000500018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2">19.&nbsp;Krishnapuram B, Carin L, Hartemink AJ. Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data. Journal Computer Biology 2004; </font><font face="verdana" size="2">11(2&#45;3): 227&#45;242.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553279&pid=S0188-9532201100010000500019&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">20.&nbsp;Xu R, Anagnostopoulos JC, Wunsch DC. Tissue classification trough analysis of gene expression data using a new family of art aechitectures. Proceedings of the IEEE&#45;INNS&#45;ENNS International Joint Conference on Neural Networks, 2002: </font><font face="verdana" size="2">300&#45;304.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553281&pid=S0188-9532201100010000500020&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>     <!-- ref --><p align="justify"><font face="verdana" size="2">21.&nbsp;Li X, Rao S, Zhang T, Guo Z, Moser KL, Topol EJ et al. An ensemble method for gene discovery based on DNA microarray data. Ser C Life Sciences 2004: 396&#45;405.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553283&pid=S0188-9532201100010000500021&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">22.&nbsp;Zhang H, Song X, Wang H, Zhang X. MIClique: an algorithm to Identify Differentially Co&#45;expressed disease gene subsets from microarray data. Journal of Biomedicine and Biotechnology 2009: 9.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553285&pid=S0188-9532201100010000500022&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2">23.&nbsp;Cho SB. Exploring features and classifiers to classify gene expression profiles of acute leukemia. International Journal of Pattern Recognition and Artificial Intelligence 2002: 831&#45;844.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=8553287&pid=S0188-9532201100010000500023&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p> 	    ]]></body>
<body><![CDATA[<p align="justify">&nbsp;</p> 	    <p align="justify"><font size="2" face="verdana"><b>Nota</b></font></p>         <p align="justify"><font face="verdana" size="2">Este art&iacute;culo tambi&eacute;n puede ser consultado en versi&oacute;n completa en: <a href="http://www.medigraphic.com/ingenieriabiomedica/" target="_blank">http://www.medigraphic.com/ingenieriabiomedica/</a></font></p>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alon]]></surname>
<given-names><![CDATA[U]]></given-names>
</name>
<name>
<surname><![CDATA[Barkai]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Notterman]]></surname>
<given-names><![CDATA[DA]]></given-names>
</name>
<name>
<surname><![CDATA[Gish]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Ybarra]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Mack]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays]]></article-title>
<collab>PNAS</collab>
<source><![CDATA[]]></source>
<year>1999</year>
<page-range>6745-6750</page-range><publisher-name><![CDATA[National Academy of Sciences]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ben-Dor]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Bruhn]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Friedman]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Nachman]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Schummer]]></surname>
<given-names><![CDATA[Yakhini Z.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Tissue classification with gene expression profiles]]></article-title>
<source><![CDATA[RECOMB, Journal of Computational Biology]]></source>
<year>2000</year>
<page-range>54-64</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Golub]]></surname>
<given-names><![CDATA[TR]]></given-names>
</name>
<name>
<surname><![CDATA[Slonim]]></surname>
<given-names><![CDATA[DK]]></given-names>
</name>
<name>
<surname><![CDATA[Tamayo]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Huard]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Gaasenbeek]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Mesirov]]></surname>
<given-names><![CDATA[JP]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring]]></article-title>
<source><![CDATA[Science]]></source>
<year>1999</year>
<volume>286</volume>
<page-range>531-537</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alizadeh]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Eisen]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Davis]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Lossos]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Rosenwald]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling]]></article-title>
<source><![CDATA[Nature]]></source>
<year>2000</year>
<volume>403</volume>
<numero>6769</numero>
<issue>6769</issue>
<page-range>503-11</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shipp]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Ross]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Tamayo]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Weng]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Kutok]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Aguiar]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning]]></article-title>
<source><![CDATA[Nat Med]]></source>
<year>2002</year>
<page-range>68-74</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Schena]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Shalon]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Davis]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Brown]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Quantitative monitoring of gene expression patterns with a complementary DNA microarray]]></article-title>
<source><![CDATA[Science]]></source>
<year>1995</year>
<volume>270</volume>
<page-range>467-470</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bobashev]]></surname>
<given-names><![CDATA[GV]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Experimental design for gene microarray experiment and differential expression analyses]]></article-title>
<source><![CDATA[Methods of Microarray Data Analysis II]]></source>
<year>2001</year>
<page-range>23-41</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Geoffrey]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Kim-Anh]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Ambroise]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Analyzing microarray gene expression data]]></source>
<year>2004</year>
<publisher-name><![CDATA[Wiley]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rusell]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Meadows]]></surname>
<given-names><![CDATA[LA]]></given-names>
</name>
<name>
<surname><![CDATA[Rusell]]></surname>
<given-names><![CDATA[RR.]]></given-names>
</name>
</person-group>
<source><![CDATA[Microarray technology in practice]]></source>
<year>2009</year>
<edition>First</edition>
<publisher-name><![CDATA[Academic Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dudoit]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Fridlyand]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Speed]]></surname>
<given-names><![CDATA[TP]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Comparison of discrimination methods for the classification of tumors using gene expression data]]></article-title>
<source><![CDATA[Journal of the American Statistical]]></source>
<year>2002</year>
<volume>97</volume>
<numero>457</numero>
<issue>457</issue>
<page-range>77-87</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Deng]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Pei]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[DL.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Rank sum test method for informative gene discovery]]></article-title>
<source><![CDATA[Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04)]]></source>
<year>2004</year>
<page-range>410-419</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Wong]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns]]></article-title>
<source><![CDATA[Genome Informatics]]></source>
<year>2002</year>
<volume>13</volume>
<page-range>51-60</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Melani]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[An introduction to genetic algorithms]]></source>
<year>1999</year>
<publisher-loc><![CDATA[Cambridge^eMassachusettsLondon Massachusetts]]></publisher-loc>
<publisher-name><![CDATA[MIT Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Burges]]></surname>
<given-names><![CDATA[CJC.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A tutorial on support vector machines for pattern recognition]]></article-title>
<source><![CDATA[Data Mining and Knowledge Discovery]]></source>
<year>1998</year>
<volume>2</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>121-167</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Joachims]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Schokopt]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[Making large-scale SVM learning practical. Advances in kernel methods-support vector learning]]></source>
<year>1999</year>
<publisher-name><![CDATA[MIT Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[SB]]></given-names>
</name>
<name>
<surname><![CDATA[Won]]></surname>
<given-names><![CDATA[HH.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Cancer classification using ensemble of neural networks with multiple significant gene subsets]]></article-title>
<source><![CDATA[Applied Intelligence]]></source>
<year>2007</year>
<volume>26</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>243-250</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Hu]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Gene selection using genetic algorithm and support vectors machines]]></article-title>
<source><![CDATA[Soft Comput]]></source>
<year>2008</year>
<volume>12</volume>
<numero>7</numero>
<issue>7</issue>
<page-range>693-698</page-range></nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alba]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[García-Nieto]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Jourdan]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Talbi]]></surname>
<given-names><![CDATA[EG.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms]]></article-title>
<source><![CDATA[Congress on Evolutionary Computation]]></source>
<year>2007</year>
<page-range>284-290</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Krishnapuram]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Carin]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Hartemink]]></surname>
<given-names><![CDATA[AJ.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data]]></article-title>
<source><![CDATA[Journal Computer Biology]]></source>
<year>2004</year>
<volume>11</volume>
<numero>2</numero><numero>3</numero>
<issue>2</issue><issue>3</issue>
<page-range>227-242</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xu]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Anagnostopoulos]]></surname>
<given-names><![CDATA[JC]]></given-names>
</name>
<name>
<surname><![CDATA[Wunsch]]></surname>
<given-names><![CDATA[DC.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Tissue classification trough analysis of gene expression data using a new family of art aechitectures]]></article-title>
<source><![CDATA[Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks]]></source>
<year>2002</year>
<page-range>300-304</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Rao]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Guo]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
<name>
<surname><![CDATA[Moser]]></surname>
<given-names><![CDATA[KL]]></given-names>
</name>
<name>
<surname><![CDATA[Topol]]></surname>
<given-names><![CDATA[EJ]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An ensemble method for gene discovery based on DNA microarray data]]></article-title>
<source><![CDATA[Ser C Life Sciences]]></source>
<year>2004</year>
<page-range>396-405</page-range></nlm-citation>
</ref>
<ref id="B22">
<label>22</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Song]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[MIClique: an algorithm to Identify Differentially Co-expressed disease gene subsets from microarray data]]></article-title>
<source><![CDATA[Journal of Biomedicine and Biotechnology]]></source>
<year>2009</year>
<page-range>9</page-range></nlm-citation>
</ref>
<ref id="B23">
<label>23</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[SB.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Exploring features and classifiers to classify gene expression profiles of acute leukemia]]></article-title>
<source><![CDATA[International Journal of Pattern Recognition and Artificial Intelligence]]></source>
<year>2002</year>
<page-range>831-844</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
