Skip navigation
Skip navigation

A Scalable Blocking Framework for Multidatabase Privacy-preserving Record Linkage

Ranbaduge, Thilina

Description

Today many application domains, such as national statistics, healthcare, business analytic, fraud detection, and national security, require data to be integrated from multiple databases. Record linkage (RL) is a process used in data integration which links multiple databases to identify matching records that belong to the same entity. RL enriches the usefulness of data by removing duplicates, errors, and inconsistencies which improves the effectiveness of...[Show more]

dc.contributor.authorRanbaduge, Thilina
dc.date.accessioned2018-02-19T00:27:31Z
dc.date.available2018-02-19T00:27:31Z
dc.identifier.otherb4959414x
dc.identifier.urihttp://hdl.handle.net/1885/140918
dc.description.abstractToday many application domains, such as national statistics, healthcare, business analytic, fraud detection, and national security, require data to be integrated from multiple databases. Record linkage (RL) is a process used in data integration which links multiple databases to identify matching records that belong to the same entity. RL enriches the usefulness of data by removing duplicates, errors, and inconsistencies which improves the effectiveness of decision making in data analytic applications. Often, organisations are not willing or authorised to share the sensitive information in their databases with any other party due to privacy and confidentiality regulations. The linkage of databases of different organisations is an emerging research area known as privacy-preserving record linkage (PPRL). PPRL facilitates the linkage of databases by ensuring the privacy of the entities in these databases. In multidatabase (MD) context, PPRL is significantly challenged by the intrinsic exponential growth in the number of potential record pair comparisons. Such linkage often requires significant time and computational resources to produce the resulting matching sets of records. Due to increased risk of collusion, preserving the privacy of the data is more problematic with an increase of number of parties involved in the linkage process. Blocking is commonly used to scale the linkage of large databases. The aim of blocking is to remove those record pairs that correspond to non-matches (refer to different entities). Many techniques have been proposed for RL and PPRL for blocking two databases. However, many of these techniques are not suitable for blocking multiple databases. This creates a need to develop blocking technique for the multidatabase linkage context as real-world applications increasingly require more than two databases. This thesis is the first to conduct extensive research on blocking for multidatabase privacy-preserved record linkage (MD-PPRL). We consider several research problems in blocking of MD-PPRL. First, we start with a broad background literature on PPRL. This allow us to identify the main research gaps that need to be investigated in MD-PPRL. Second, we introduce a blocking framework for MD-PPRL which provides more flexibility and control to database owners in the block generation process. Third, we propose different techniques that are used in our framework for (1) blocking of multiple databases, (2) identifying blocks that need to be compared across subgroups of these databases, and (3) filtering redundant record pair comparisons by the efficient scheduling of block comparisons to improve the scalability of MD-PPRL. Each of these techniques covers an important aspect of blocking in real-world MD-PPRL applications. Finally, this thesis reports on an extensive evaluation of the combined application of these methods with real datasets, which illustrates that they outperform existing approaches in term of scalability, accuracy, and privacy.
dc.language.isoen
dc.subjectRecord Linkage
dc.subjectPrivacy
dc.subjectMultidatabase
dc.subjectScalable Blocking
dc.subjectBlocking Framework
dc.titleA Scalable Blocking Framework for Multidatabase Privacy-preserving Record Linkage
dc.typeThesis (PhD)
local.contributor.supervisorChristen, Peter
local.contributor.supervisorcontactpeter.christen@anu.edu.au
dcterms.valid2018
local.description.notesthe author deposited 19/02/18
local.type.degreeDoctor of Philosophy (PhD)
dc.date.issued2017
local.contributor.affiliationResearch School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.identifier.doi10.25911/5d6e4942bae9d
local.identifier.proquestYes
local.mintdoimint
CollectionsOpen Access Theses

Download

File Description SizeFormat Image
Ranbaduge Thesis 2018.pdf5.34 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator