Skip navigation
Skip navigation

Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution

Ramadan, Banda; Christen, Peter; Liang, Huizhi (Elly); Gayler, Ross

Description

Real-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been...[Show more]

dc.contributor.authorRamadan, Banda
dc.contributor.authorChristen, Peter
dc.contributor.authorLiang, Huizhi (Elly)
dc.contributor.authorGayler, Ross
dc.date.accessioned2016-02-24T22:41:23Z
dc.identifier.issn1936-1955
dc.identifier.urihttp://hdl.handle.net/1885/98664
dc.description.abstractReal-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been successfully used for ER of large static databases. However, because it is based on static sorted arrays and is designed for batch ER that resolves all records in a database rather than resolving those relating to a single query record, this technique is not suitable for real-time ER on dynamic databases that are constantly updated. We propose a tree-based technique that facilitates dynamic indexing based on the sorted neighborhood method, which can be used for real-time ER, and investigate both static and adaptive window approaches. We propose an approach to reduce query matching times by precalculating the similarities between attribute values stored in neighboring tree nodes. We also propose a multitree solution where different sorting keys are used to reduce the effects of errors and variations in attribute values on matching quality by building several distinct index trees. We experimentally evaluate our proposed techniques on large real datasets, as well as on synthetic data with different data quality characteristics. Our results show that as the index grows, no appreciable increase occurs in both record insertion and query times, and that using multiple trees gives noticeable improvements on matching quality with only a small increase in query time. Compared to earlier indexing techniques for real-time ER, our approach achieves significantly reduced indexing and query matching times while maintaining high matching accuracy.
dc.publisherAssociation for Computing Machinary, Inc.
dc.sourceJournal of Data and Information Quality
dc.titleDynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume6
dc.date.issued2015
local.identifier.absfor080403 - Data Structures
local.identifier.ariespublicationU3488905xPUB6700
local.type.statusPublished Version
local.contributor.affiliationRamadan, Banda, College of Engineering and Computer Science, ANU
local.contributor.affiliationChristen, Peter, College of Engineering and Computer Science, ANU
local.contributor.affiliationLiang, Huizhi (Elly), College of Engineering and Computer Science, ANU
local.contributor.affiliationGayler, Ross, Veda Advantage
local.description.embargo2037-12-31
local.bibliographicCitation.issue4
local.bibliographicCitation.startpage1
local.bibliographicCitation.lastpage29
local.identifier.doi10.1145/2816821
local.identifier.absseo970108 - Expanding Knowledge in the Information and Computing Sciences
dc.date.updated2016-02-24T10:11:32Z
local.identifier.scopusID2-s2.0-84946100321
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Ramadan_Dynamic_Sorted_Neighborhood_2015.pdf1.6 MBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator