Thompson Sampling is Asymptotically Optimal in General Environments

Loading...
Thumbnail Image

Date

Authors

Leike, Jan
Lattimore, Tor
Orseau, Laurent
Hutter, Marcus

Journal Title

Journal ISSN

Volume Title

Publisher

AUAI Press

Abstract

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

Description

Citation

Source

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence

Book Title

Entity type

Access Statement

Free Access via Publisher site

License Rights

DOI

Restricted until

2099-12-31