Thompson Sampling is Asymptotically Optimal in General Environments

Leike, Jan; Lattimore, Tor; Orseau, Laurent; Hutter, Marcus

Thompson Sampling is Asymptotically Optimal in General Environments

Date

2016

Authors

Leike, Jan

Lattimore, Tor

Orseau, Laurent

Hutter, Marcus

Publisher

AUAI Press

Abstract

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

Keywords

General reinforcement learning, Thompson sampling, asymptotic optimality, regret, discounting, recoverability, AIXI

URI

http://hdl.handle.net/1885/272947

Collections

ANU Research Publications

Source

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence

Type

Conference paper

Access Statement

Free Access via Publisher site

Restricted until

2099-12-31

Downloads

File

Description

Thomson sampling is asymptotically optimal.pdf (191.88 KB)

Full item page

Thompson Sampling is Asymptotically Optimal in General Environments

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads