An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage

Date

2010

Authors

Vatsalan, Dinusha
Christen, Peter
Verykios, Vassilios

Journal Title

Journal ISSN

Volume Title

Publisher

Australian Computer Society Inc.

Abstract

The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL).While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality. We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/ accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL.

Description

Keywords

Keywords: Application area; Approximate matching; Attribute values; Binning; Computational burden; Large database; Personal information; Privacy concerns; Privacy preserving; Real-world application; Real-world database; Record linkage; Reference values; Research pr Approximate matching; Binning; Entity resolution; Privacy technologies; Scalability; Similarity measure; Two-party protocol

Citation

Source

Conferences in Research and Practice in Information Technology - CRPIT

Type

Journal article

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until