Bayesian reinforcement learning with exploration
We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.
|Collections||ANU Research Publications|
|Lattimore and Hutter Bayesian Reinforcement Learning 2014.pdf||392.15 kB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.