Skip navigation
Skip navigation

Bayesian reinforcement learning with exploration

Lattimore, Tor; Hutter, Marcus

Description

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

CollectionsANU Research Publications
Date published: 2014
Type: Conference paper
URI: http://hdl.handle.net/1885/58180
Source: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 8776
DOI: 10.1007/978-3-319-11662-4_13

Download

File Description SizeFormat Image
01_Lattimore_Bayesian_reinforcement_2014.pdf317.25 kBAdobe PDF


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator