Regression classification for Improved Temporal Record Linkage
Date
2016
Authors
Wang, Qing
Vatsalan, Dinusha
Christen, Peter
Hu, Yichen
Journal Title
Journal ISSN
Volume Title
Publisher
Australasian Data Mining Conference
Abstract
Temporal record linkage is the process of identifying groups of records which are collected over long periods of time, such as census databases or voter registration databases, that represent the same real-world entities. These datasets often contain temporal information for each record, such as the time when a record was created, or the time when it was modified. Unlike traditional record linkage, which treats differences between records from the same entity as errors or variations, temporal record linkage aims to capture records from entities where the details of these entities change over the time. This paper proposes a temporal record linkage approach that learns the probabilities for attribute values of records to change within different periods of time, which extends an existing temporal approach decay model. The proposed method uses a regression based machine learning model to predict decay with sets of attributes, where attribute values in each set could affect the decay of others. Our experimental results show that the proposed approach results in generally better recall than baseline approaches on real-world datasets.
Description
Keywords
Data matching, entity resolution, record linkage, temporal data
Citation
Collections
Source
Conferences in Research and Practice in Information Technology
Type
Conference paper
Book Title
Entity type
Access Statement
License Rights
DOI
Restricted until
2099-12-31