Scalable entity resolution using probabilistic signatures on parallel databases
dc.contributor.author | Zhang, Yuhang | |
dc.contributor.author | Ng, Kee Siong | |
dc.contributor.author | Churchill, Tania | |
dc.contributor.author | Christen, Peter | |
dc.contributor.editor | Paton, N | |
dc.contributor.editor | Candan, S | |
dc.contributor.editor | Wan, H | |
dc.contributor.editor | Allan, J | |
dc.contributor.editor | Agrawal, R | |
dc.contributor.editor | Labrinidis, A | |
dc.coverage.spatial | Torino, Italy | |
dc.date.accessioned | 2024-02-13T23:28:03Z | |
dc.date.created | October 22-26 2018 | |
dc.date.issued | 2018 | |
dc.date.updated | 2022-10-02T07:19:41Z | |
dc.description.abstract | Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a first-principles formulation of entity resolution, this paper presents a novel entity resolution algorithm that introduces a data-driven blocking and record linkage technique based on the probabilistic identification of entity signatures in data. The scalability and accuracy of the proposed algorithm are evaluated using benchmark datasets and shown to achieve state-of-the-art results. The proposed algorithm can be implemented simply on modern parallel databases, which we have done in the financial intelligence domain with tens of Terabytes of noisy data. | en_AU |
dc.format.mimetype | application/pdf | en_AU |
dc.identifier.isbn | 978-145036014-2 | en_AU |
dc.identifier.uri | http://hdl.handle.net/1885/313562 | |
dc.language.iso | en_AU | en_AU |
dc.publisher | Association for Computing Machinery (ACM) | en_AU |
dc.relation.ispartofseries | 27th ACM International Conference on Information and Knowledge Management, CIKM 2018 | en_AU |
dc.rights | © 2018 Copyright held by the owner/author(s).Publication rights licensed to ACM | en_AU |
dc.source | International Conference on Information and Knowledge Management, Proceedings | en_AU |
dc.subject | Large-scale entity resolution | en_AU |
dc.subject | connected components | en_AU |
dc.subject | probabilistic signature | en_AU |
dc.subject | in-database analytics | en_AU |
dc.title | Scalable entity resolution using probabilistic signatures on parallel databases | en_AU |
dc.type | Conference paper | en_AU |
local.bibliographicCitation.lastpage | 2221 | en_AU |
local.bibliographicCitation.startpage | 2213 | en_AU |
local.contributor.affiliation | Zhang, Yuhang, AUSTRAC | en_AU |
local.contributor.affiliation | Ng, Kee Siong, College of Engineering and Computer Science, ANU | en_AU |
local.contributor.affiliation | Churchill, Tania, AUSTRAC | en_AU |
local.contributor.affiliation | Christen, Peter, College of Engineering and Computer Science, ANU | en_AU |
local.contributor.authoremail | u9914730@anu.edu.au | en_AU |
local.contributor.authoruid | Ng, Kee Siong, u9914730 | en_AU |
local.contributor.authoruid | Christen, Peter, u4021539 | en_AU |
local.description.embargo | 2099-12-31 | |
local.description.notes | Imported from ARIES | en_AU |
local.description.refereed | Yes | |
local.identifier.absfor | 460507 - Information extraction and fusion | en_AU |
local.identifier.absfor | 460502 - Data mining and knowledge discovery | en_AU |
local.identifier.ariespublication | u3102795xPUB230 | en_AU |
local.identifier.doi | 10.1145/3269206.3272016 | en_AU |
local.identifier.scopusID | 2-s2.0-85058054904 | |
local.identifier.thomsonID | WOS:000455712300297 | |
local.identifier.uidSubmittedBy | u3102795 | en_AU |
local.publisher.url | https://dl.acm.org/ | en_AU |
local.type.status | Published Version | en_AU |
Downloads
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- ScalableEntityResolutionUsingProbabilisticSignatures.pdf
- Size:
- 1.2 MB
- Format:
- Adobe Portable Document Format
- Description: