Skip navigation
Skip navigation

Bayesian reinforcement learning with exploration

Lattimore, Tor; Hutter, Marcus


We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

CollectionsANU Research Publications
Date published: 2014
Type: Conference paper
Source: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 8776
DOI: 10.1007/978-3-319-11662-4_13
Access Rights: Open Access


File Description SizeFormat Image
01_Lattimore_Bayesian_reinforcement_2014.pdf317.25 kBAdobe PDF

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator