Skip navigation
Skip navigation

A Kernel Two-Sample Test

Gretton, Arthur; Borgwardt, Karsten; Rasch, Malte J; Schoelkopf, Bernhard; Smola, Alexander

Description

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD).We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic...[Show more]

dc.contributor.authorGretton, Arthur
dc.contributor.authorBorgwardt, Karsten
dc.contributor.authorRasch, Malte J
dc.contributor.authorSchoelkopf, Bernhard
dc.contributor.authorSmola, Alexander
dc.date.accessioned2015-12-13T22:22:40Z
dc.identifier.issn1532-4435
dc.identifier.urihttp://hdl.handle.net/1885/72353
dc.description.abstractWe propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD).We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
dc.publisherMIT Press
dc.sourceJournal of Machine Learning Research
dc.subjectKeywords: Hypothesis testing; Integral probability metric; Kernel methods; Schema matching; Two-sample tests; Uniform convergence; Statistical tests; Probability distributions Hypothesis testing; Integral probability metric; Kernel methods; Schema matching; Two-sample test; Uniform convergence bounds
dc.titleA Kernel Two-Sample Test
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume13
dc.date.issued2012
local.identifier.absfor080599 - Distributed Computing not elsewhere classified
local.identifier.ariespublicationf5625xPUB3206
local.type.statusPublished Version
local.contributor.affiliationGretton, Arthur, Max Planck Institute for Biological Cybernetics
local.contributor.affiliationBorgwardt, Karsten, Max Planck Institutes
local.contributor.affiliationRasch, Malte J, Beijing Normal University
local.contributor.affiliationSchoelkopf, Bernhard, Max Planck Institute for Biological Cybernetics
local.contributor.affiliationSmola, Alexander, College of Engineering and Computer Science, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.startpage723
local.bibliographicCitation.lastpage773
dc.date.updated2016-02-24T09:06:37Z
local.identifier.scopusID2-s2.0-84859477054
local.identifier.thomsonID000298596500007
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Gretton_A_Kernel_Two-Sample_Test_2012.pdf460.92 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator