<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462020000200429</article-id>
<article-id pub-id-type="doi">10.13053/cys-24-2-3369</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Comparison of Clustering Algorithms in Text Clustering Tasks]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gallardo García]]></surname>
<given-names><![CDATA[Rafael]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Beltrán]]></surname>
<given-names><![CDATA[Beatriz]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
<xref ref-type="aff" rid="Aaf"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Vilariño]]></surname>
<given-names><![CDATA[Darnes]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
<xref ref-type="aff" rid="Aaf"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Zepeda]]></surname>
<given-names><![CDATA[Claudia]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Martínez]]></surname>
<given-names><![CDATA[Rodolfo]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Benemérita Universidad Autónoma de Puebla Faculty of Computer Science ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Benemérita Universidad Autónoma de Puebla Language &amp; Knowledge Engineering Lab ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<volume>24</volume>
<numero>2</numero>
<fpage>429</fpage>
<lpage>437</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462020000200429&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462020000200429&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462020000200429&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: The purpose of this paper is to compare the performance and accuracy of several clustering algorithms in text clustering tasks. The text preprocessing were realized by using the Term Frequency - Inverse Document Frequency in order to obtain weights for each word in each text and then obtain weights for each text. The Cosine Similarity was used as the similarity measure between the texts. The clustering tasks were realized over the PAN dataset and three different algorithms were used: Affinity Propagation, K-Means and Spectral Clustering. This paper presents the results in comparative tables: ID of the task, ground truth clusters and the clusters generated by the algorithms. A table with precision, recall and f-measure scores is presented.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Affinity propagation]]></kwd>
<kwd lng="en"><![CDATA[f-measure]]></kwd>
<kwd lng="en"><![CDATA[k-means]]></kwd>
<kwd lng="en"><![CDATA[spectral clustering]]></kwd>
<kwd lng="en"><![CDATA[PAN]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Arthur]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Vassilvitskii]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[K-means++: The advantages of careful seeding]]></source>
<year>2007</year>
<volume>8</volume>
<conf-name><![CDATA[ Annu. ACM-SIAM Symp. on Discrete Algorithms]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1027-35</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Estivill-Castro]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Why so many clustering algorithms]]></article-title>
<source><![CDATA[ACM SIGKDD Explorations Newsletter]]></source>
<year>2002</year>
<volume>4</volume>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Fahad]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Alshatri]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Tari]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Alamri]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Khalil]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Zomaya]]></surname>
<given-names><![CDATA[A. Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Foufou]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Bouras]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A survey of clustering algorithms for big data: Taxonomy and empirical analysis]]></article-title>
<source><![CDATA[IEEE Transactions on Emerging Topics in Computing]]></source>
<year>2014</year>
<volume>2</volume>
<page-range>267-79</page-range><publisher-name><![CDATA[IEEE]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Frey]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Dueck]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Clustering by passing messages between data points]]></article-title>
<source><![CDATA[Science]]></source>
<year>2007</year>
<volume>315</volume>
<page-range>972-6</page-range><publisher-loc><![CDATA[New York, N.Y. ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[K. S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A statistical interpretation of term specificity and its application in retrieval]]></article-title>
<source><![CDATA[Journal of documentation]]></source>
<year>1972</year>
</nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lloyd]]></surname>
<given-names><![CDATA[S. P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Least squares quantization in PCM]]></article-title>
<source><![CDATA[IEEE Trans. Information Theory]]></source>
<year>1982</year>
<volume>28</volume>
<page-range>129-36</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Powers]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation]]></article-title>
<source><![CDATA[J. Mach. Learn. Technol]]></source>
<year>2011</year>
<volume>2</volume>
<page-range>2229-3981</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rajaraman]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Ullman]]></surname>
<given-names><![CDATA[J. D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Mining of Massive Datasets]]></source>
<year>2011</year>
<publisher-loc><![CDATA[USA ]]></publisher-loc>
<publisher-name><![CDATA[Cambridge University Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[V. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Tiwari]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Garg]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Document clustering using k-means, heuristic k-means and fuzzy c-means]]></source>
<year>2011</year>
<conf-name><![CDATA[ 2011 International Conference on Computational Intelligence and Communication Networks]]></conf-name>
<conf-loc> </conf-loc>
<page-range>297-301</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Steinbach]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Karypis]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Kumar]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
</person-group>
<source><![CDATA[A comparison of document clustering techniques]]></source>
<year>2000</year>
<conf-name><![CDATA[ International KDD Workshop on Text Mining]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Karypis]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<source><![CDATA[Comparison of Agglomerative and Partitional Document Clustering Algorithms]]></source>
<year>2002</year>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
