Clustering-Based Scalable Indexing for Multi-party Privacy

Ranbaduge, Thilina; Vatsalan, Dinusha; Christen, Peter

doi:10.1007/978-3-319-18032-8_43

A change is coming. Click to see a sneak peek of the new Open Research Repository.

Clustering-Based Scalable Indexing for Multi-party Privacy

Request a Copy

link to publisher version

Altmetric Citations

Ranbaduge, Thilina; Vatsalan, Dinusha; Christen, Peter

Description

The identification of common sets of records in multiple databases has become an increasingly important subject in many application areas, including banking, health, and national security. Often privacy concerns and regulations prevent the owners of the databases from sharing any sensitive details of their records with each other, and with any other party. The linkage of records in multiple databases while preserving privacy and confidentiality is an emerging research discipline known as...[Show more] privacy-preserving record linkage (PPRL). We propose a novel two-step indexing (blocking) approach for PPRL between multiple (more than two) parties. First, we generate small mini-blocks using a multi-bit Bloom filter splitting method and second we merge these mini-blocks based on their similarity using a novel hierarchical canopy clustering technique. An empirical study conducted with large datasets of up-to one million records shows that our approach is scalable with the size of the datasets and the number of parties, while providing better privacy than previous multi-party indexing approaches.

dc.contributor.author	Ranbaduge, Thilina
dc.contributor.author	Vatsalan, Dinusha
dc.contributor.author	Christen, Peter
dc.coverage.spatial	Ho Chi Minh City, Vietnam
dc.date.accessioned	2016-06-14T23:21:12Z
dc.date.created	May 19-22 2015
dc.identifier.isbn	9783319180311
dc.identifier.uri	http://hdl.handle.net/1885/103766
dc.description.abstract	The identification of common sets of records in multiple databases has become an increasingly important subject in many application areas, including banking, health, and national security. Often privacy concerns and regulations prevent the owners of the databases from sharing any sensitive details of their records with each other, and with any other party. The linkage of records in multiple databases while preserving privacy and confidentiality is an emerging research discipline known as privacy-preserving record linkage (PPRL). We propose a novel two-step indexing (blocking) approach for PPRL between multiple (more than two) parties. First, we generate small mini-blocks using a multi-bit Bloom filter splitting method and second we merge these mini-blocks based on their similarity using a novel hierarchical canopy clustering technique. An empirical study conducted with large datasets of up-to one million records shows that our approach is scalable with the size of the datasets and the number of parties, while providing better privacy than previous multi-party indexing approaches.
dc.publisher	Springer International Publishing AG
dc.relation.ispartofseries	Pacific-Asia Conference, on Knowledge Discovery and Data Mining, PAKDD 2015
dc.source	Efficient Interactive Training Selection for Large-Scale Entity Resolution
dc.title	Clustering-Based Scalable Indexing for Multi-party Privacy
dc.type	Conference paper
local.description.notes	Imported from ARIES
local.description.refereed	Yes
dc.date.issued	2015
local.identifier.absfor	080109 - Pattern Recognition and Data Mining
local.identifier.ariespublication	u4334215xPUB1486
local.type.status	Published Version
local.contributor.affiliation	Ranbaduge, Thilina, College of Engineering and Computer Science, ANU
local.contributor.affiliation	Vatsalan, Dinusha, College of Engineering and Computer Science, ANU
local.contributor.affiliation	Christen, Peter, College of Engineering and Computer Science, ANU
local.description.embargo	2037-12-31
local.bibliographicCitation.startpage	549
local.bibliographicCitation.lastpage	561
local.identifier.doi	10.1007/978-3-319-18032-8_43
local.identifier.absseo	970108 - Expanding Knowledge in the Information and Computing Sciences
dc.date.updated	2016-06-14T09:02:46Z
local.identifier.scopusID	2-s2.0-84945585620
Collections	ANU Research Publications

Download

File	Description	Size	Format	Image
01_Ranbaduge_Clustering-Based_Scalable_2015.pdf		381.66 kB	Adobe PDF	Request a copy

Show simple item record

Clustering-Based Scalable Indexing for Multi-party Privacy

Altmetric Citations

Description

Download