Thompson Sampling is Asymptotically Optimal in General Environments
Loading...
Date
Authors
Leike, Jan
Lattimore, Tor
Orseau, Laurent
Hutter, Marcus
Journal Title
Journal ISSN
Volume Title
Publisher
AUAI Press
Abstract
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
Description
Citation
Collections
Source
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence
Type
Book Title
Entity type
Access Statement
Free Access via Publisher site
License Rights
DOI
Restricted until
2099-12-31
Downloads
File
Description