Nonparametric General Reinforcement Learning
Reinforcement learning problems are often phrased in terms of Markov decision processes (MDPs). In this thesis we go beyond MDPs and consider reinforcement learning in environments that are non-Markovian, non-ergodic and only partially observable. Our focus is not on practical algorithms, but rather on the fundamental underlying problems: How do we balance exploration and exploitation? How do we explore optimally? When is an agent optimal? We follow the...[Show more]
|Collections||Open Access Theses|
|Leike Thesis 2016.pdf||1.42 MB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.