Blind Data Linkage Using n-gram Similarity Comparisons
Integrating or linking data from different sources is an increasingly important task in the preprocessing stage of many data mining projects. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. If no common unique entity identifiers (keys) are available in all data sources, the linkage needs to be performed using the available identifying attributes, like names and addresses. Data confidentiality often limits or even prohibits...[Show more]
|Collections||ANU Research Publications|
|Source:||Advances in Knowledge Discovery and Data Mining. 8th Pacific-Asia Conference, PAKDD 2004 Proceedings|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.