Robust Record Linkage Blocking using Suffix Arrays
Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic scalability for the brute force approach necessitates the design of appropriate indexing or blocking techniques. We design and evaluate an efficient and highly scalable blocking approach based on suffix arrays. Our suffix grouping technique exploits the ordering used by the index to merge similar blocks at marginal...[Show more]
|Collections||ANU Research Publications|
|Source:||Proceedings of the 18th ACM Conference on Information and Knowledge Management|
|01_De Vries_Robust_Record_Linkage_Blocking_2009.pdf||417.12 kB||Adobe PDF||Request a copy|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.