Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Using metric space indexing for complete and efficient record linkage

dc.contributor.authorAkgün, Özgür
dc.contributor.authorDearle, Alan
dc.contributor.authorKirby, Graham
dc.contributor.authorChristen, Peter
dc.contributor.editorPhung, D
dc.contributor.editorTseng, V S
dc.contributor.editorWebb, G I
dc.contributor.editorHo, B
dc.contributor.editorGanji, M
dc.contributor.editorRashidi, L
dc.coverage.spatialMelbourne, Australia
dc.date.accessioned2024-02-13T00:55:05Z
dc.date.createdJune 3-6 2018
dc.date.issued2018
dc.date.updated2022-10-02T07:19:36Z
dc.description.abstractRecord linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.en_AU
dc.description.sponsorshipThis work was supported by ESRC grants ES/K00574X/2 “Digitising Scotland” and ES/L007487/1 “Administrative Data Research Centre— Scotland”en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.isbn978-331993039-8en_AU
dc.identifier.urihttp://hdl.handle.net/1885/313440
dc.language.isoen_AUen_AU
dc.publisherSpringer Verlagen_AU
dc.relation.ispartofseries22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018en_AU
dc.rights© Springer International Publishing AG, part of Springer Nature 2018en_AU
dc.sourceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en_AU
dc.subjectEntity resolutionen_AU
dc.subjectData matchingen_AU
dc.subjectSimilarity search Blockingen_AU
dc.titleUsing metric space indexing for complete and efficient record linkageen_AU
dc.typeConference paperen_AU
local.bibliographicCitation.lastpage101en_AU
local.bibliographicCitation.startpage89en_AU
local.contributor.affiliationAkgün, Özgür, University of St Andrewsen_AU
local.contributor.affiliationDearle, Alan, University of St Andrewsen_AU
local.contributor.affiliationKirby, Graham, University of St Andrewsen_AU
local.contributor.affiliationChristen, Peter, College of Engineering and Computer Science, ANUen_AU
local.contributor.authoruidChristen, Peter, u4021539en_AU
local.description.embargo2099-12-31
local.description.notesImported from ARIESen_AU
local.description.refereedYes
local.identifier.absfor460507 - Information extraction and fusionen_AU
local.identifier.absfor460504 - Data qualityen_AU
local.identifier.absfor460502 - Data mining and knowledge discoveryen_AU
local.identifier.ariespublicationu3102795xPUB1815en_AU
local.identifier.doi10.1007/978-3-319-93040-4_8en_AU
local.identifier.scopusID2-s2.0-85049367618
local.publisher.urlhttps://link.springer.com/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
978-3-319-93040-4_8.pdf
Size:
488.39 KB
Format:
Adobe Portable Document Format
Description:
abcd