Skip navigation
Skip navigation

Bayesian reinforcement learning with exploration

Lattimore, Tor; Hutter, Marcus

Description

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

CollectionsANU Research Publications
Date published: 2014-10
Type: Conference paper
URI: http://hdl.handle.net/1885/14709
DOI: 10.1007/978-3-319-11662-4_13

Download

File Description SizeFormat Image
Lattimore and Hutter Bayesian Reinforcement Learning 2014.pdf392.15 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator