Scalable entity resolution using probabilistic signatures on parallel databases

dc.contributor.authorZhang, Yuhang
dc.contributor.authorNg, Kee Siong
dc.contributor.authorChurchill, Tania
dc.contributor.authorChristen, Peter
dc.contributor.editorPaton, N
dc.contributor.editorCandan, S
dc.contributor.editorWan, H
dc.contributor.editorAllan, J
dc.contributor.editorAgrawal, R
dc.contributor.editorLabrinidis, A
dc.coverage.spatialTorino, Italy
dc.date.accessioned2024-02-13T23:28:03Z
dc.date.createdOctober 22-26 2018
dc.date.issued2018
dc.date.updated2022-10-02T07:19:41Z
dc.description.abstractAccurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a first-principles formulation of entity resolution, this paper presents a novel entity resolution algorithm that introduces a data-driven blocking and record linkage technique based on the probabilistic identification of entity signatures in data. The scalability and accuracy of the proposed algorithm are evaluated using benchmark datasets and shown to achieve state-of-the-art results. The proposed algorithm can be implemented simply on modern parallel databases, which we have done in the financial intelligence domain with tens of Terabytes of noisy data.en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.isbn978-145036014-2en_AU
dc.identifier.urihttp://hdl.handle.net/1885/313562
dc.language.isoen_AUen_AU
dc.publisherAssociation for Computing Machinery (ACM)en_AU
dc.relation.ispartofseries27th ACM International Conference on Information and Knowledge Management, CIKM 2018en_AU
dc.rights© 2018 Copyright held by the owner/author(s).Publication rights licensed to ACMen_AU
dc.sourceInternational Conference on Information and Knowledge Management, Proceedingsen_AU
dc.subjectLarge-scale entity resolutionen_AU
dc.subjectconnected componentsen_AU
dc.subjectprobabilistic signatureen_AU
dc.subjectin-database analyticsen_AU
dc.titleScalable entity resolution using probabilistic signatures on parallel databasesen_AU
dc.typeConference paperen_AU
local.bibliographicCitation.lastpage2221en_AU
local.bibliographicCitation.startpage2213en_AU
local.contributor.affiliationZhang, Yuhang, AUSTRACen_AU
local.contributor.affiliationNg, Kee Siong, College of Engineering and Computer Science, ANUen_AU
local.contributor.affiliationChurchill, Tania, AUSTRACen_AU
local.contributor.affiliationChristen, Peter, College of Engineering and Computer Science, ANUen_AU
local.contributor.authoremailu9914730@anu.edu.auen_AU
local.contributor.authoruidNg, Kee Siong, u9914730en_AU
local.contributor.authoruidChristen, Peter, u4021539en_AU
local.description.embargo2099-12-31
local.description.notesImported from ARIESen_AU
local.description.refereedYes
local.identifier.absfor460507 - Information extraction and fusionen_AU
local.identifier.absfor460502 - Data mining and knowledge discoveryen_AU
local.identifier.ariespublicationu3102795xPUB230en_AU
local.identifier.doi10.1145/3269206.3272016en_AU
local.identifier.scopusID2-s2.0-85058054904
local.identifier.thomsonIDWOS:000455712300297
local.identifier.uidSubmittedByu3102795en_AU
local.publisher.urlhttps://dl.acm.org/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ScalableEntityResolutionUsingProbabilisticSignatures.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
Description:
Back to topicon-arrow-up-solid
 
APRU
IARU
 
edX
Group of Eight Member

Acknowledgement of Country

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.


Contact ANUCopyrightDisclaimerPrivacyFreedom of Information

+61 2 6125 5111 The Australian National University, Canberra

TEQSA Provider ID: PRV12002 (Australian University) CRICOS Provider Code: 00120C ABN: 52 234 063 906