Monge blunts bayes: Hardness results for adversarial training
Loading...
Date
Authors
Cranko, Zac
Menon, Aditya Krishna
Nock, Richard
Ong, Cheng Soon
Shi, Zhan
Walder, Christian
Journal Title
Journal ISSN
Volume Title
Publisher
Curran Associates, Inc.
Abstract
The last few years have seen a staggering number
of empirical studies of the robustness of neural
networks in a model of adversarial perturbations
of their inputs. Most rely on an adversary which
carries out local modifications within prescribed
balls. None however has so far questioned the
broader picture: how to frame a resource-bounded
adversary so that it can be severely detrimental to
learning, a non-trivial problem which entails at a
minimum the choice of loss and classifiers.
We suggest a formal answer for losses that satisfy the minimal statistical requirement of being
proper. We pin down a simple sufficient property
for any given class of adversaries to be detrimental
to learning, involving a central measure of “harmfulness” which generalizes the well-known class
of integral probability metrics. A key feature of
our result is that it holds for all proper losses, and
for a popular subset of these, the optimisation of
this central measure appears to be independent of
the loss. When classifiers are Lipschitz – a now
popular approach in adversarial training –, this
optimisation resorts to optimal transport to make
a low-budget compression of class marginals. Toy
experiments reveal a finding recently separately
observed: training against a sufficiently budgeted
adversary of this kind improves generalization.
Description
Keywords
Citation
Collections
Source
Proceedings of the 36th International Conference on Machine Learning, ICML 2019
Type
Book Title
Entity type
Access Statement
Free Access via publisher website
License Rights
DOI
Restricted until
2099-12-31
Downloads
File
Description