Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

On Thompson sampling and asymptotic optimality

Loading...
Thumbnail Image

Date

Authors

Leike, Jan
Lattimore, Tor
Orseau, Laurent
Hutter, Marcus

Journal Title

Journal ISSN

Volume Title

Publisher

International Joint Conferences on Artificial Intelligence

Abstract

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Description

Keywords

Citation

Source

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Book Title

Entity type

Access Statement

Free Access via Publisher Site

License Rights

Restricted until

2099-12-31

Downloads

File
Description
abcd