Skip navigation
Skip navigation

Bayesian reinforcement learning with exploration

Lattimore, Tor; Hutter, Marcus


We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

CollectionsANU Research Publications
Date published: 2014-10
Type: Conference paper
DOI: 10.1007/978-3-319-11662-4_13
Access Rights: Open Access


File Description SizeFormat Image
Lattimore and Hutter Bayesian Reinforcement Learning 2014.pdf392.15 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator