Services on Demand
Journal
Article
Indicators
- Cited by SciELO
- Access statistics
Related links
- Similars in SciELO
Share
Polibits
On-line version ISSN 1870-9044
Abstract
DăNăILă, Iulia; DINU, Liviu P.; NICULAE, Vlad and SULEA, Octavia-Maria. String Distances for Near-duplicate Detection. Polibits [online]. 2012, n.45, pp.21-25. ISSN 1870-9044.
Near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. In this paper, we present the results of applying the Rank distance and the Smith-Waterman distance, along with more popular string similarity measures such as the Levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection.
Keywords : Near-duplicate detection; string similarity measures; database; data mining.