Online learning algorithms for reinforcement learning with function approximation
| dc.contributor.author | Robards, Matthew Walters | |
| dc.date.accessioned | 2018-11-22T00:06:42Z | |
| dc.date.available | 2018-11-22T00:06:42Z | |
| dc.date.copyright | 2011 | |
| dc.date.issued | 2011 | |
| dc.date.updated | 2018-11-21T03:29:04Z | |
| dc.description.abstract | Reinforcement learning deals with the problem of sequential decision making in uncertain stochastic environments. In this thesis I deal with agents who attempt to solve the reinforcement learning problem online and in real-time. This presents experimental challenges for which I introduce novel kernelised algorithms. Kernel algorithms are very useful in reinforcement learning settings as they enable learning in situations where a very high-dimensional or hand engineered feature vector would otherwise be required. Furthermore, I attempt to address the theoretical challenges which arise from online on-policy algorithms, for which I introduce a type of analysis which is novel (and useful) to reinforcement learning in its lack of restrictive assumptions on the behaviour policy. I will introduce three novel algorithms attempting to advance the areas of kernel, empirical and theoretical reinforcement learning. The first of these algorithms presents a kernel extension of SARSA for its empirical properties - namely its incorporation of eligibility traces with sparse kernel algorithms. I then present a model-free/model-based ensemble which use gradient based methods for online learning. I present them with regret analysis which enables an analysis of the value functions learned with no probabilistic assumptions, and hence no assumptions on the behaviour policy. Along the way I also make a novel "sub-contribution", namely non-squared loss functions for reinforcement learning. The use of different loss functions constitutes a running theme through the algorithms I introduce, as I show that various non-traditional (to reinforcement learning) loss functions can be useful for both efficiency of the algorithm, and for accuracy by ensuring smooth function approximations. I present thorough experimental and theoretical analyses along the way. | |
| dc.format.extent | xxvi, 141 leaves. | |
| dc.identifier.other | b2878952 | |
| dc.identifier.uri | http://hdl.handle.net/1885/150825 | |
| dc.language.iso | en_AU | en_AU |
| dc.rights | Author retains copyright | en_AU |
| dc.subject.lcc | Q325.6.R63 2011 | |
| dc.subject.lcsh | Reinforcement learning | |
| dc.subject.lcsh | Algorithms | |
| dc.subject.lcsh | Kernel functions | |
| dc.title | Online learning algorithms for reinforcement learning with function approximation | |
| dc.type | Thesis (PhD) | en_AU |
| dcterms.accessRights | Open Access | en_AU |
| local.contributor.affiliation | Australian National University. Research School of Computer Science. | |
| local.description.notes | Thesis (Ph.D.)--Australian National University | en_AU |
| local.identifier.doi | 10.25911/5d51538ad356e | |
| local.mintdoi | mint | |
| local.type.status | Accepted Version | en_AU |