Noise-tolerant approximate blocking for dynamic real-time entity resolution

dc.contributor.authorLiang, Huizhien
dc.contributor.authorWang, Yanzheen
dc.contributor.authorChristen, Peteren
dc.contributor.authorGayler, Rossen
dc.date.accessioned2026-01-01T08:42:43Z
dc.date.available2026-01-01T08:42:43Z
dc.date.issued2014en
dc.description.abstractEntity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world datasets show the effectiveness of the proposed approach.en
dc.description.sponsorshipThis research was funded by the Australian Research Council (ARC), Veda Advantage, and Funnelback Pty. Ltd., under Linkage Project LP100200079. Note the first two authors contributed equally.en
dc.description.statusPeer-revieweden
dc.format.extent12en
dc.identifier.issn0302-9743en
dc.identifier.otherORCID:/0000-0003-3435-2015/work/162291384en
dc.identifier.scopus84901276901en
dc.identifier.urihttps://hdl.handle.net/1885/733799345
dc.language.isoenen
dc.relation.ispartofseries18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014en
dc.sourceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en
dc.subjectEntity Resolutionen
dc.subjectIndexingen
dc.subjectLocality Sensitive Hashingen
dc.subjectReal-timeen
dc.titleNoise-tolerant approximate blocking for dynamic real-time entity resolutionen
dc.typeConference paperen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage460en
local.bibliographicCitation.startpage449en
local.contributor.affiliationLiang, Huizhi; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationWang, Yanzhe; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationChristen, Peter; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationGayler, Ross; Vedaen
local.identifier.ariespublicationU3488905xPUB2761en
local.identifier.citationvolume8444 LNAIen
local.identifier.doi10.1007/978-3-319-06605-9_37en
local.identifier.pure80bcba48-2017-439d-aa7b-fbb53a196b8cen
local.identifier.urlhttps://www.scopus.com/pages/publications/84901276901en
local.type.statusPublisheden

Downloads