SciELO - Scientific Electronic Library Online

 
 issue45A Flexible Table Parsing ApproachComparing Sanskrit Texts for Critical Editions: The Sequences Move Problem author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Abstract

DăNăILă, Iulia; DINU, Liviu P.; NICULAE, Vlad  and  SULEA, Octavia-Maria. String Distances for Near-duplicate Detection. Polibits [online]. 2012, n.45, pp.21-25. ISSN 1870-9044.

Near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. In this paper, we present the results of applying the Rank distance and the Smith-Waterman distance, along with more popular string similarity measures such as the Levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection.

Keywords : Near-duplicate detection; string similarity measures; database; data mining.

        · text in English     · English ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License