Skip navigation
Skip navigation

Bayesian reinforcement learning with exploration

Lattimore, Tor; Hutter, Marcus


We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

CollectionsANU Research Publications
Date published: 2014-10
Type: Conference paper
DOI: 10.1007/978-3-319-11662-4_13


File Description SizeFormat Image
Lattimore and Hutter Bayesian Reinforcement Learning 2014.pdf392.15 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator