Regression classification for Improved Temporal Record Linkage

Date

2016

Authors

Wang, Qing
Vatsalan, Dinusha
Christen, Peter
Hu, Yichen

Journal Title

Journal ISSN

Volume Title

Publisher

Australasian Data Mining Conference

Abstract

Temporal record linkage is the process of identifying groups of records which are collected over long periods of time, such as census databases or voter registration databases, that represent the same real-world entities. These datasets often contain temporal information for each record, such as the time when a record was created, or the time when it was modified. Unlike traditional record linkage, which treats differences between records from the same entity as errors or variations, temporal record linkage aims to capture records from entities where the details of these entities change over the time. This paper proposes a temporal record linkage approach that learns the probabilities for attribute values of records to change within different periods of time, which extends an existing temporal approach decay model. The proposed method uses a regression based machine learning model to predict decay with sets of attributes, where attribute values in each set could affect the decay of others. Our experimental results show that the proposed approach results in generally better recall than baseline approaches on real-world datasets.

Description

Keywords

Data matching, entity resolution, record linkage, temporal data

Citation

Source

Conferences in Research and Practice in Information Technology

Type

Conference paper

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

2099-12-31