Skip navigation
Skip navigation

Robust Record Linkage Blocking using Suffix Arrays

De Vries, Timothy; Ke, Hui; Chawla, Sanjay; Christen, Peter


Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic scalability for the brute force approach necessitates the design of appropriate indexing or blocking techniques. We design and evaluate an efficient and highly scalable blocking approach based on suffix arrays. Our suffix grouping technique exploits the ordering used by the index to merge similar blocks at marginal...[Show more]

CollectionsANU Research Publications
Date published: 2009
Type: Conference paper
Source: Proceedings of the 18th ACM Conference on Information and Knowledge Management
DOI: 10.1145/1645953.1645994


File Description SizeFormat Image
01_De Vries_Robust_Record_Linkage_Blocking_2009.pdf417.12 kBAdobe PDF    Request a copy

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  23 August 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator