On Thompson sampling and asymptotic optimality

Leike, Jan; Lattimore, Tor; Orseau, Laurent; Hutter, Marcus

On Thompson sampling and asymptotic optimality

Date

2017

Authors

Leike, Jan

Lattimore, Tor

Orseau, Laurent

Hutter, Marcus

Publisher

International Joint Conferences on Artificial Intelligence

Abstract

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

URI

http://hdl.handle.net/1885/313140

Collections

ANU Research Publications

Source

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Type

Conference paper

Access Statement

Free Access via Publisher Site

DOI

10.24963/ijcai.2017/688

Restricted until

2099-12-31

Downloads

File

Description

0688.pdf (112.42 KB)

Full item page

Cultural advice

On Thompson sampling and asymptotic optimality

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads