Automatic Training Example Selection for Scalable Unsupervised Record Linkage

dc.contributor.authorChristen, Peter
dc.coverage.spatialOsaka Japan
dc.date.accessioned2015-12-08T22:48:29Z
dc.date.createdMay 20-23 2008
dc.date.issued2008
dc.date.updated2015-12-08T11:06:40Z
dc.description.abstractLinking records from two or more databases is an increasingly important data preparation step in many data mining projects, as linked data can enable studies that are not feasible otherwise, or that would require expensive collection of specific data. The aim of such linkages is to match all records that refer to the same entity. One of the main challenges in record linkage is the accurate classification of record pairs into matches and non-matches. Many modern classification techniques are based on supervised machine learning and thus require training data, which is often not available in real world situations. A novel two-step approach to unsupervised record pair classification is presented in this paper. In the first step, training examples are selected automatically, and they are then used in the second step to train a binary classifier. An experimental evaluation shows that this approach can outperform k-means clustering and also be much faster than other classification techniques.
dc.identifier.isbn9783540681243
dc.identifier.urihttp://hdl.handle.net/1885/38353
dc.publisherSpringer
dc.relation.ispartofseriesPacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008)
dc.sourceAdvances in Knowledge Discovery and Data Mining 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining Proceedings
dc.subjectKeywords: Automatic programming; Binary decision diagrams; Classification (of information); Clustering algorithms; Data mining; Support vector machines; Data linkage; Data mining preprocessing; Entity resolution; k-means clustering; Unsupervised learning Clustering; Data linkage; Data mining preprocessing; Entity resolution; Support vector machines
dc.titleAutomatic Training Example Selection for Scalable Unsupervised Record Linkage
dc.typeConference paper
local.bibliographicCitation.lastpage528
local.bibliographicCitation.startpage511
local.contributor.affiliationChristen, Peter, College of Engineering and Computer Science, ANU
local.contributor.authoruidChristen, Peter, u4021539
local.description.embargo2037-12-31
local.description.notesImported from ARIES
local.description.refereedYes
local.identifier.absfor080109 - Pattern Recognition and Data Mining
local.identifier.ariespublicationU3594520xPUB161
local.identifier.doi10.1007/978-3-540-68125-0_45
local.identifier.scopusID2-s2.0-44649093306
local.type.statusPublished Version

Downloads

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
01_Christen_Automatic_Training_Example_2008.pdf
Size:
2.46 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
02_Christen_Automatic_Training_Example_2008.pdf
Size:
163.99 KB
Format:
Adobe Portable Document Format