Thompson Sampling is Asymptotically Optimal in General Environments
| dc.contributor.author | Leike, Jan | |
| dc.contributor.author | Lattimore, Tor | |
| dc.contributor.author | Orseau, Laurent | |
| dc.contributor.author | Hutter, Marcus | |
| dc.contributor.editor | Ihler, Alexander | |
| dc.contributor.editor | Janzing, Dominik | |
| dc.coverage.spatial | Jersey City, New Jersey, USA | |
| dc.date.accessioned | 2022-09-21T05:56:12Z | |
| dc.date.created | June 25-29 2016 | |
| dc.date.issued | 2016 | |
| dc.date.updated | 2021-08-01T08:41:28Z | |
| dc.description.abstract | We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear. | en_AU |
| dc.format.mimetype | application/pdf | en_AU |
| dc.identifier.isbn | 9781510827806 | en_AU |
| dc.identifier.uri | http://hdl.handle.net/1885/272947 | |
| dc.language.iso | en_AU | en_AU |
| dc.publisher | AUAI Press | en_AU |
| dc.relation.ispartofseries | 32nd Conference on Uncertainty in Artificial Intelligence 2016 | en_AU |
| dc.rights | © 2016 AUAI Press | en_AU |
| dc.source | Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence | en_AU |
| dc.subject | General reinforcement learning | en_AU |
| dc.subject | Thompson sampling | en_AU |
| dc.subject | asymptotic optimality | en_AU |
| dc.subject | regret | en_AU |
| dc.subject | discounting | en_AU |
| dc.subject | recoverability | en_AU |
| dc.subject | AIXI | en_AU |
| dc.title | Thompson Sampling is Asymptotically Optimal in General Environments | en_AU |
| dc.type | Conference paper | en_AU |
| dcterms.accessRights | Free Access via Publisher site | en_AU |
| local.bibliographicCitation.lastpage | 426 | en_AU |
| local.bibliographicCitation.startpage | 417 | en_AU |
| local.contributor.affiliation | Leike, Jan, College of Engineering and Computer Science, ANU | en_AU |
| local.contributor.affiliation | Lattimore, Tor, University of Alberta | en_AU |
| local.contributor.affiliation | Orseau, Laurent, Google DeepMind | en_AU |
| local.contributor.affiliation | Hutter, Marcus, College of Engineering and Computer Science, ANU | en_AU |
| local.contributor.authoruid | Leike, Jan, u5485774 | en_AU |
| local.contributor.authoruid | Hutter, Marcus, u4350841 | en_AU |
| local.description.embargo | 2099-12-31 | |
| local.description.notes | Imported from ARIES | en_AU |
| local.description.refereed | Yes | |
| local.identifier.ariespublication | u6048437xPUB382 | en_AU |
| local.publisher.url | https://www.auai.org/ | en_AU |
| local.type.status | Published Version | en_AU |
Downloads
Original bundle
1 - 1 of 1
Loading...
- Name:
- Thomson sampling is asymptotically optimal.pdf
- Size:
- 191.88 KB
- Format:
- Adobe Portable Document Format
- Description: