<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462014000100011</article-id>
<article-id pub-id-type="doi">10.13053/CyS-18-1-2014-024</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Introducing Biases in Document Clustering]]></article-title>
<article-title xml:lang="es"><![CDATA[Introducción de sesgos en el agrupamiento de documentos]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Ramírez-Cruz]]></surname>
<given-names><![CDATA[Yunior]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Center for Pattern Recognition and Data Mining Content Management Systems Division ]]></institution>
<addr-line><![CDATA[Santiago de Cuba ]]></addr-line>
<country>Cuba</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>03</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>03</month>
<year>2014</year>
</pub-date>
<volume>18</volume>
<numero>1</numero>
<fpage>137</fpage>
<lpage>151</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462014000100011&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462014000100011&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462014000100011&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[In this paper, we present three criteria for introducing biases in document clustering algorithms, when information characterizing the document collections is available. We focus on collections known to be the result of a document categorization or sample-based document filtering process. Our proposals rely on profiles, i.e., document samples known to have been used for obtaining the collection, to extract statistics which determine the biases to introduce. We conduct an experimental evaluation over a number of collections extracted from the widely used corpus RCV1, which allows us to confirm the validity of our proposals and determine a number of situations where biased clusterings, according to different criteria, outperform their unbiased counterparts.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[En este artículo se presentan tres criterios para la introducción de sesgos en algoritmos de agrupamiento de documentos, cuando se dispone de información que caracteriza las colecciones de documentos. Nos concentramos en colecciones de las que se conoce que son el resultado de un proceso de categorización o filtrado de documentos basado en muestras. Nuestras propuestas utilizan perfiles, es decir muestras de documentos de las que se conoce que han sido utilizadas para obtener la colección, para extraer estadísticos que determinan los sesgos a introducir. Llevamos a cabo una evaluación experimental sobre un conjunto de colecciones extraídas del corpus ampliamente utilizado RCV1, que nos permiten confirmar la validez de nuestras propuestas y determinar un número de situaciones donde los agrupamientos sesgados según diferentes criterios superan a sus contrapartes no sesgadas.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Document clustering]]></kwd>
<kwd lng="en"><![CDATA[introduc biases]]></kwd>
<kwd lng="es"><![CDATA[Agrupamiento de documentos]]></kwd>
<kwd lng="es"><![CDATA[introducción de sesgos]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[  	    <p align="justify"><font face="verdana" size="4">Art&iacute;culos</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="center"><font face="verdana" size="4"><b>Introducing Biases in Document Clustering</b></font></p>  	    <p align="center"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="center"><font face="verdana" size="3"><b>Introducci&oacute;n de sesgos en el agrupamiento de documentos</b></font></p>  	    <p align="center"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="center"><font face="verdana" size="2"><b>Yunior Ram&iacute;rez&#45;Cruz</b></font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><i>Center for Pattern Recognition and Data Mining, Content Management Systems Division, DATYS, Santiago de Cuba, Cuba</i>. <a href="mailto:yunior@cerpamid.co.cu">yunior@cerpamid.co.cu</a></font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Abstract</b></font></p>  	    <p align="justify"><font face="verdana" size="2">In this paper, we present three criteria for introducing biases in document clustering algorithms, when information characterizing the document collections is available. We focus on collections known to be the result of a document categorization or sample&#45;based document filtering process. Our proposals rely on profiles, i.e., document samples known to have been used for obtaining the collection, to extract statistics which determine the biases to introduce. We conduct an experimental evaluation over a number of collections extracted from the widely used corpus RCV1, which allows us to confirm the validity of our proposals and determine a number of situations where biased clusterings, according to different criteria, outperform their unbiased counterparts.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Keywords.</b> Document clustering, introduc biases.</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Resumen</b></font></p>  	    <p align="justify"><font face="verdana" size="2">En este art&iacute;culo se presentan tres criterios para la introducci&oacute;n de sesgos en algoritmos de agrupamiento de documentos, cuando se dispone de informaci&oacute;n que caracteriza las colecciones de documentos. Nos concentramos en colecciones de las que se conoce que son el resultado de un proceso de categorizaci&oacute;n o filtrado de documentos basado en muestras. Nuestras propuestas utilizan perfiles, es decir muestras de documentos de las que se conoce que han sido utilizadas para obtener la colecci&oacute;n, para extraer estad&iacute;sticos que determinan los sesgos a introducir. Llevamos a cabo una evaluaci&oacute;n experimental sobre un conjunto de colecciones extra&iacute;das del corpus ampliamente utilizado RCV1, que nos permiten confirmar la validez de nuestras propuestas y determinar un n&uacute;mero de situaciones donde los agrupamientos sesgados seg&uacute;n diferentes criterios superan a sus contrapartes no sesgadas.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Palabras clave.</b> Agrupamiento de documentos, introducci&oacute;n de sesgos.</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><a href="/pdf/cys/v18n1/v18n1a11.pdf" target="_blank">DESCARGAR ART&Iacute;CULO EN FORMATO PDF</a></font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>References</b></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>1. Carpineto, C., Osinski, S., Romano, G. &amp; Weiss, D. (2009).</b> A Survey of Web Clustering Engines. <i>ACM Computing Surveys</i> 41(3).    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064273&pid=S1405-5546201400010001100001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>2. Ram&iacute;rez&#45;Cruz, Y. (2013).</b> Assessing the Effect of Introducing Biases in Document Clustering, <i>Proceedings of the XV International Convention and Fair Inform&aacute;tica 2013.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064275&pid=S1405-5546201400010001100002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></i></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>3. Kyriakopoulou, A. &amp; Kalamboukis, T. (2006).</b> Text Classification Using Clustering. <i>Proceedings of the Discovery Challenge Workshop at ECML/PKDD 2006,</i> 28&#45;38.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064277&pid=S1405-5546201400010001100003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>4. Kalton, A., Wagstaff, K. &amp; Yoo, J. (2001).</b> Generalized Clustering, Supervised Learning, and Data Assignment. <i>Proceedings of the ACM SIGKDD Seventh International Conference on Knowledge Discovery and Data Mining,</i> 299&#45;304.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064279&pid=S1405-5546201400010001100004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>5. J. Hartigan &amp; Wong, M. (1979).</b> Algorithm AS136: A K&#45;Means clustering algorithm. <i>Applied Statistics</i> 28, 100&#45;108.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064281&pid=S1405-5546201400010001100005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>6. Palmer, C. &amp; Faloutsos, C. (2000).</b> Density biased sampling: An improved method for data mining and clustering. <i>Proceedings of the ACM SIGMOD 19th International Conference on Management of Data,</i> 82&#45;92.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064283&pid=S1405-5546201400010001100006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>7. Salton, G., Wong, A. &amp; Yang, C. S. (1975).</b> A Vector Space Model for Automatic Indexing. <i>Communications of the ACM</i> 18(11), 613&#45;620.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064285&pid=S1405-5546201400010001100007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>8. Lindstone, G.J. (1920).</b> Note on the General Case of the Bayes&#45;Laplace Formula for Inductive or a Posteriori Probabilities. <i>Transactions of the Faculty of Actuaries</i> 8, 182&#45;192.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064287&pid=S1405-5546201400010001100008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>9. Lewis, D.D., Yang, Y., Rose, T. &amp; Li, F. (2004).</b> RCV1: A New Benchmark Collection for Text Categorization Research. <i>Journal of Machine Learning Research</i> 5, 361&#45;397.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064289&pid=S1405-5546201400010001100009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>10. Halkidi, M., Batistakis, Y. &amp; Vazirgiannis, M. (2001).</b> On Clustering Validation Techniques. <i>Journal of Intelligent Information Systems</i> 17(2&#45;3), 107&#45;145.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064291&pid=S1405-5546201400010001100010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>11. van Rijsbergen, C.J. (1979).</b> <i>Information Retrieval,</i> London: Butterworths.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064293&pid=S1405-5546201400010001100011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>12. L&oacute;pez&#45;Caviedes, M. &amp; S&aacute;nchez&#45;D&iacute;az, G. (2004).</b> A New Clustering Criterion in Pattern Recognition. <i>WSEAS Transactions on Computers</i> 3(3), 558562.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064295&pid=S1405-5546201400010001100012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>13. Hill, D.R. (1968).</b> A vector clustering technique. <i>Mechanized Information Storage, Retrieval and Dissemination.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064297&pid=S1405-5546201400010001100013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></i></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>14. Mart&iacute;nez&#45;Trinidad, J.F., Ruiz&#45;Shulcloper, J. &amp; Lazo&#45;Cort&eacute;s, M. (2000).</b> Structuralization of Universes. <i>Fuzzy Sets and Systems</i> 112(3), 485&#45;500.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064299&pid=S1405-5546201400010001100014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>15. Gil&#45;Garc&iacute;a, R., Bad&iacute;a&#45;Contelles, J.M. &amp; Pons&#45;Porrata, A. (2003).</b> Extended Star Clustering Algorithm. <i>Lecture Notes on Computer Science</i> 2905, 480&#45;487.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064301&pid=S1405-5546201400010001100015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>16. Pons&#45;Porrata, A., S&aacute;nchez&#45;D&iacute;az, G., Lazo&#45;Cort&eacute;s, M. &amp; Alfonso&#45;Ram&iacute;rez, L. (2005).</b> An Incremental Clustering Algorithm based on Compact Sets with Radius alpha. <i>Lecture Notes on Computer Sciences</i> 3773, 302&#45;310.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064303&pid=S1405-5546201400010001100016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>17.</b> <b>Sparck&#45;Jones, K. (1972).</b> A Statistical Interpretation of Term Specificity and Its Application in Retrieval. <i>Journal of Documentation</i> 28(1), 11&#45;21.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064305&pid=S1405-5546201400010001100017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>18. Efron, B. &amp; Tibshirani, R. (1993).</b> <i>An Introduction to the Bootstrap.</i> London: Chapman and Hall/CRC Press.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2064307&pid=S1405-5546201400010001100018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Carpineto]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Osinski]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Romano]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Weiss]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Survey of Web Clustering Engines]]></article-title>
<source><![CDATA[ACM Computing Surveys]]></source>
<year>2009</year>
<volume>41</volume>
<numero>3</numero>
<issue>3</issue>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ramírez-Cruz]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Assessing the Effect of Introducing Biases in Document Clustering]]></article-title>
<source><![CDATA[Proceedings of the XV International Convention and Fair Informática]]></source>
<year>2013</year>
<month>20</month>
<day>13</day>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kyriakopoulou]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Kalamboukis]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Text Classification Using Clustering]]></article-title>
<source><![CDATA[Proceedings of the Discovery Challenge Workshop at ECML/PKDD]]></source>
<year>2006</year>
<month>20</month>
<day>06</day>
<page-range>28-38</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kalton]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Wagstaff]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Yoo]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Generalized Clustering, Supervised Learning, and Data Assignment]]></article-title>
<source><![CDATA[Proceedings of the ACM SIGKDD Seventh International Conference on Knowledge Discovery and Data Mining]]></source>
<year>2001</year>
<page-range>299-304</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hartigan]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Wong]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Algorithm AS136: A K-Means clustering algorithm]]></article-title>
<source><![CDATA[Applied Statistics]]></source>
<year>1979</year>
<volume>28</volume>
<page-range>100-108</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Palmer]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Faloutsos]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Density biased sampling: An improved method for data mining and clustering]]></article-title>
<source><![CDATA[Proceedings of the ACM SIGMOD 19th International Conference on Management of Data]]></source>
<year>2000</year>
<page-range>82-92</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Salton]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Wong]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[C. S.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Vector Space Model for Automatic Indexing]]></article-title>
<source><![CDATA[Communications of the ACM]]></source>
<year>1975</year>
<volume>18</volume>
<numero>11</numero>
<issue>11</issue>
<page-range>613-620</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lindstone]]></surname>
<given-names><![CDATA[G.J.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Note on the General Case of the Bayes-Laplace Formula for Inductive or a Posteriori Probabilities]]></article-title>
<source><![CDATA[Transactions of the Faculty of Actuaries]]></source>
<year>1920</year>
<volume>8</volume>
<page-range>182-192</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lewis]]></surname>
<given-names><![CDATA[D.D.]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Rose]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[RCV1: A New Benchmark Collection for Text Categorization Research]]></article-title>
<source><![CDATA[Journal of Machine Learning Research]]></source>
<year>2004</year>
<volume>5</volume>
<page-range>361-397</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Halkidi]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Batistakis]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Vazirgiannis]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[On Clustering Validation Techniques]]></article-title>
<source><![CDATA[Journal of Intelligent Information Systems]]></source>
<year>2001</year>
<volume>17</volume>
<numero>2</numero><numero>3</numero>
<issue>2</issue><issue>3</issue>
<page-range>107-145</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[van Rijsbergen]]></surname>
<given-names><![CDATA[C.J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Information Retrieval]]></source>
<year>1979</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Butterworths]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[López-Caviedes]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sánchez-Díaz]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A New Clustering Criterion in Pattern Recognition]]></article-title>
<source><![CDATA[WSEAS Transactions on Computers]]></source>
<year>2004</year>
<volume>3</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>558562</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hill]]></surname>
<given-names><![CDATA[D.R.]]></given-names>
</name>
</person-group>
<source><![CDATA[A vector clustering technique. Mechanized Information Storage, Retrieval and Dissemination]]></source>
<year>1968</year>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Martínez-Trinidad]]></surname>
<given-names><![CDATA[J.F.]]></given-names>
</name>
<name>
<surname><![CDATA[Ruiz-Shulcloper]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Lazo-Cortés]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Structuralization of Universes]]></article-title>
<source><![CDATA[Fuzzy Sets and Systems]]></source>
<year>2000</year>
<volume>112</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>485-500</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gil-García]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Badía-Contelles]]></surname>
<given-names><![CDATA[J.M.]]></given-names>
</name>
<name>
<surname><![CDATA[Pons-Porrata]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Extended Star Clustering Algorithm]]></article-title>
<source><![CDATA[Lecture Notes on Computer Science]]></source>
<year>2003</year>
<volume>2905</volume>
<page-range>480-487</page-range></nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pons-Porrata]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Sánchez-Díaz]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Lazo-Cortés]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Alfonso-Ramírez]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An Incremental Clustering Algorithm based on Compact Sets with Radius alpha]]></article-title>
<source><![CDATA[Lecture Notes on Computer Sciences]]></source>
<year>2005</year>
<volume>3773</volume>
<page-range>302-310</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sparck-Jones]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Statistical Interpretation of Term Specificity and Its Application in Retrieval]]></article-title>
<source><![CDATA[Journal of Documentation]]></source>
<year>1972</year>
<volume>28</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>11-21</page-range></nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Efron]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Tibshirani]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<source><![CDATA[An Introduction to the Bootstrap]]></source>
<year>1993</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Chapman and HallCRC Press]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
