Thompson Sampling is Asymptotically Optimal in General Environments

Leike, Jan; Lattimore, Tor; Orseau, Laurent; Hutter, Marcus

Thompson Sampling is Asymptotically Optimal in General Environments

dc.contributor.author	Leike, Jan
dc.contributor.author	Lattimore, Tor
dc.contributor.author	Orseau, Laurent
dc.contributor.author	Hutter, Marcus
dc.contributor.editor	Ihler, Alexander
dc.contributor.editor	Janzing, Dominik
dc.coverage.spatial	Jersey City, New Jersey, USA
dc.date.accessioned	2022-09-21T05:56:12Z
dc.date.created	June 25-29 2016
dc.date.issued	2016
dc.date.updated	2021-08-01T08:41:28Z
dc.description.abstract	We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.	en_AU
dc.format.mimetype	application/pdf	en_AU
dc.identifier.isbn	9781510827806	en_AU
dc.identifier.uri	http://hdl.handle.net/1885/272947
dc.language.iso	en_AU	en_AU
dc.publisher	AUAI Press	en_AU
dc.relation.ispartofseries	32nd Conference on Uncertainty in Artificial Intelligence 2016	en_AU
dc.rights	© 2016 AUAI Press	en_AU
dc.source	Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence	en_AU
dc.subject	General reinforcement learning	en_AU
dc.subject	Thompson sampling	en_AU
dc.subject	asymptotic optimality	en_AU
dc.subject	regret	en_AU
dc.subject	discounting	en_AU
dc.subject	recoverability	en_AU
dc.subject	AIXI	en_AU
dc.title	Thompson Sampling is Asymptotically Optimal in General Environments	en_AU
dc.type	Conference paper	en_AU
dcterms.accessRights	Free Access via Publisher site	en_AU
local.bibliographicCitation.lastpage	426	en_AU
local.bibliographicCitation.startpage	417	en_AU
local.contributor.affiliation	Leike, Jan, College of Engineering and Computer Science, ANU	en_AU
local.contributor.affiliation	Lattimore, Tor, University of Alberta	en_AU
local.contributor.affiliation	Orseau, Laurent, Google DeepMind	en_AU
local.contributor.affiliation	Hutter, Marcus, College of Engineering and Computer Science, ANU	en_AU
local.contributor.authoruid	Leike, Jan, u5485774	en_AU
local.contributor.authoruid	Hutter, Marcus, u4350841	en_AU
local.description.embargo	2099-12-31
local.description.notes	Imported from ARIES	en_AU
local.description.refereed	Yes
local.identifier.ariespublication	u6048437xPUB382	en_AU
local.publisher.url	https://www.auai.org/	en_AU
local.type.status	Published Version	en_AU

Downloads

Original bundle

Now showing 1 - 1 of 1

Name:: Thomson sampling is asymptotically optimal.pdf
Size:: 191.88 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

ANU Research Publications