Fast learning from distributed datasets without entity matching

Patrini, Giorgio; Nock, Richard; Hardy, Stephen; Caetano, Tiberio

Fast learning from distributed datasets without entity matching

Date

2016

Authors

Patrini, Giorgio

Nock, Richard

Hardy, Stephen

Caetano, Tiberio

Publisher

AAAI Press

Abstract

Consider the following scenario: two datasets/peers contain the same real-world entities described using partially shared features, e.g. banking and insurance company records of the same customer base. Our goal is to learn a classifier in the cross product space of the two domains, in the hard case in which no shared ID is available -e.g. due to anonymization. Traditionally, the problem is approached by first addressing entity matching and subsequently learning the classifier in a standard manner. We present an end-to-end solution which bypasses matching entities, based on the recently introduced concept of Rademacher observations (rados). Informally, we replace the minimisation of a loss over examples, which requires entity resolution, by the equivalent minimisation of a (different) loss over rados. We show that (i) a potentially exponential-size subset of these rados does not require entity matching, and (ii) the algorithm that provably minimizes the loss over rados has time and space complexities smaller than the algorithm minimizing the equivalent example loss. Last, we relax a key assumption, that the data is vertically partitioned among peers-in this case, we would not even know the existence of a solution to entity resolution. In this more general setting, experiments validate the possibility of beating even the optimal peer in hindsight

URI

http://hdl.handle.net/1885/154182

Collections

ANU Research Publications

Source

IJCAI International Joint Conference on Artificial Intelligence

Type

Conference paper

Access Statement

Open Access

Downloads

File

Description

01_Patrini_Fast_learning_from_distributed_2016.pdf (2.81 MB)

Full item page

Cultural advice

Fast learning from distributed datasets without entity matching

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads