A comparison of personal name matching: Techniques and practical issues
| dc.contributor.author | Christen, Peter | |
| dc.contributor.editor | Conference Program Committee | |
| dc.coverage.spatial | Hong Kong | |
| dc.date.accessioned | 2007-02-07T05:16:34Z | en_US |
| dc.date.accessioned | 2011-01-05T08:38:21Z | |
| dc.date.available | 2007-02-07T05:16:34Z | en_US |
| dc.date.available | 2011-01-05T08:38:21Z | |
| dc.date.created | 2006-09 | en_US |
| dc.date.issued | 2006-09 | en_US |
| dc.date.updated | 2015-12-08T09:07:36Z | |
| dc.description.abstract | Finding and matching personal names is at the core of an increasing number of applications: from text and Web mining, information retrieval and extraction, search engines, to deduplication and data linkage systems. Variations and errors in names make exact string matching problematic, and approximate matching techniques based on phonetic encoding or pattern matching have to be applied. When compared to general text, however, personal names have different characteristics that need to be considered. ¶ In this paper we discuss the characteristics of personal names and present potential sources of variations and errors. We overview a comprehensive number of commonly used, as well as some recently developed name matching techniques. Experimental comparisons on four large name data sets indicate that there is no clear best technique. We provide a series of recommendations that will help researchers and practitioners to select a name matching technique suitable for a given data set. | |
| dc.identifier.citation | http://cs.anu.edu.au/techreports/2006/TR-CS-06-02.html | |
| dc.identifier.isbn | 1601320043 | |
| dc.identifier.uri | http://hdl.handle.net/1885/44521 | en_US |
| dc.identifier.uri | http://digitalcollections.anu.edu.au/handle/1885/44521 | |
| dc.language.iso | en | en_US |
| dc.publisher | Canberra, ACT: Dept. of Computer Science / Computer Sciences Laboratory, The Australian National University | en_AU |
| dc.relation.ispartofseries | Joint Computer Science Technical Report Series, no.06-02 | en_US |
| dc.source | Proceedings of the 2006 International Conference Conference on Data Mining | |
| dc.source.uri | http://www.world-academy-of-science.org/worldcomp06/ws/publications/dmin06/index_html | |
| dc.subject | String matching | |
| dc.subject | phonetic encoding | |
| dc.subject | pattern matching | |
| dc.subject | data linkage | |
| dc.subject | personal name characteristics | |
| dc.subject | TR-CS | |
| dc.title | A comparison of personal name matching: Techniques and practical issues | |
| dc.type | Working/Technical Paper | en_AU |
| dcterms.accessRights | Open Access | en_AU |
| local.bibliographicCitation.lastpage | 294 | |
| local.bibliographicCitation.startpage | 290 | |
| local.citation | TR-CS-06-02 | en_US |
| local.contributor.affiliation | ANU | en_US |
| local.contributor.affiliation | Department of Computer Science, FEIT | en_US |
| local.contributor.authoruid | Christen, Peter, u4021539 | |
| local.description.refereed | no | en_US |
| local.identifier.absfor | 080109 - Pattern Recognition and Data Mining | |
| local.identifier.ariespublication | u4251866xPUB103 | |
| local.identifier.scopusID | 2-s2.0-78449293191 | |
| local.rights.ispublished | yes | en_US |
| local.type.status | Published version | en_AU |