Skip navigation
Skip navigation

Privacy-preserving matching of similar patients

Vatsalan, Dinusha; Christen, Peter

Description

The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to...[Show more]

dc.contributor.authorVatsalan, Dinusha
dc.contributor.authorChristen, Peter
dc.date.accessioned2016-02-24T22:41:56Z
dc.identifier.issn1532-0464
dc.identifier.urihttp://hdl.handle.net/1885/98863
dc.description.abstractThe identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to increasing privacy and confidentiality concerns, using the actual attribute values of patient records to identify similar records across different organizations is becoming non-trivial because the attributes in such records often contain highly sensitive information such as personal and medical details of patients. Therefore, the matching needs to be based on masked (encoded) values while being effective and efficient to allow matching of large databases. Bloom filter encoding has widely been used as an efficient masking technique for privacy-preserving matching of string and categorical values. However, no work on Bloom filter-based masking of numerical data, such as integer (e.g. age), floating point (e.g. body mass index), and modulus (numbers wrap around upon reaching a certain value, e.g. date and time), which are commonly required in the health domain, has been presented in the literature. We propose a framework with novel methods for masking numerical data using Bloom filters, thereby facilitating the calculation of similarities between records. We conduct an empirical study on publicly available real-world datasets which shows that our framework provides efficient masking and achieves similar matching accuracy compared to the matching of actual unencoded patient records.
dc.publisherAcademic Press
dc.sourceJournal of Biomedical Informatics
dc.titlePrivacy-preserving matching of similar patients
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume59
dc.date.issued2015
local.identifier.absfor080403 - Data Structures
local.identifier.ariespublicationu4056230xPUB547
local.type.statusPublished Version
local.contributor.affiliationVatsalan, Dinusha, College of Engineering and Computer Science, ANU
local.contributor.affiliationChristen, Peter, College of Engineering and Computer Science, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.startpage285
local.bibliographicCitation.lastpage298
local.identifier.doi10.1016/j.jbi.2015.12.004
local.identifier.absseo970108 - Expanding Knowledge in the Information and Computing Sciences
dc.date.updated2016-06-14T08:58:31Z
local.identifier.scopusID2-s2.0-84961575473
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Vatsalan_Privacy-preserving_matching_of_2015.pdf1.72 MBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator