SciELO - Scientific Electronic Library Online

 
vol.18 issue1Speech Enhancement with Local Adaptive Rank-Order FilteringNoise Detection and Learning Based on Current Information author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

Print version ISSN 1405-5546

Abstract

RAMIREZ-CRUZ, Yunior. Introducing Biases in Document Clustering. Comp. y Sist. [online]. 2014, vol.18, n.1, pp.137-151. ISSN 1405-5546.  http://dx.doi.org/10.13053/CyS-18-1-2014-024.

In this paper, we present three criteria for introducing biases in document clustering algorithms, when information characterizing the document collections is available. We focus on collections known to be the result of a document categorization or sample-based document filtering process. Our proposals rely on profiles, i.e., document samples known to have been used for obtaining the collection, to extract statistics which determine the biases to introduce. We conduct an experimental evaluation over a number of collections extracted from the widely used corpus RCV1, which allows us to confirm the validity of our proposals and determine a number of situations where biased clusterings, according to different criteria, outperform their unbiased counterparts.

Keywords : Document clustering; introduc biases.

        · abstract in Spanish     · text in English     · English ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License