Bayesian reinforcement learning with exploration
We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.
|Collections||ANU Research Publications|
|Source:||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 8776|
|01_Lattimore_Bayesian_reinforcement_2014.pdf||317.25 kB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.