On Thompson sampling and asymptotic optimality
Date
2017
Authors
Leike, Jan
Lattimore, Tor
Orseau, Laurent
Hutter, Marcus
Journal Title
Journal ISSN
Volume Title
Publisher
International Joint Conferences on Artificial Intelligence
Abstract
We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.
Description
Keywords
Citation
Collections
Source
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Type
Conference paper
Book Title
Entity type
Access Statement
Free Access via Publisher Site
License Rights
Restricted until
2099-12-31
Downloads
File
Description