Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Online learning algorithms for reinforcement learning with function approximation

dc.contributor.authorRobards, Matthew Walters
dc.date.accessioned2018-11-22T00:06:42Z
dc.date.available2018-11-22T00:06:42Z
dc.date.copyright2011
dc.date.issued2011
dc.date.updated2018-11-21T03:29:04Z
dc.description.abstractReinforcement learning deals with the problem of sequential decision making in uncertain stochastic environments. In this thesis I deal with agents who attempt to solve the reinforcement learning problem online and in real-time. This presents experimental challenges for which I introduce novel kernelised algorithms. Kernel algorithms are very useful in reinforcement learning settings as they enable learning in situations where a very high-dimensional or hand engineered feature vector would otherwise be required. Furthermore, I attempt to address the theoretical challenges which arise from online on-policy algorithms, for which I introduce a type of analysis which is novel (and useful) to reinforcement learning in its lack of restrictive assumptions on the behaviour policy. I will introduce three novel algorithms attempting to advance the areas of kernel, empirical and theoretical reinforcement learning. The first of these algorithms presents a kernel extension of SARSA for its empirical properties - namely its incorporation of eligibility traces with sparse kernel algorithms. I then present a model-free/model-based ensemble which use gradient based methods for online learning. I present them with regret analysis which enables an analysis of the value functions learned with no probabilistic assumptions, and hence no assumptions on the behaviour policy. Along the way I also make a novel "sub-contribution", namely non-squared loss functions for reinforcement learning. The use of different loss functions constitutes a running theme through the algorithms I introduce, as I show that various non-traditional (to reinforcement learning) loss functions can be useful for both efficiency of the algorithm, and for accuracy by ensuring smooth function approximations. I present thorough experimental and theoretical analyses along the way.
dc.format.extentxxvi, 141 leaves.
dc.identifier.otherb2878952
dc.identifier.urihttp://hdl.handle.net/1885/150825
dc.language.isoen_AUen_AU
dc.rightsAuthor retains copyrighten_AU
dc.subject.lccQ325.6.R63 2011
dc.subject.lcshReinforcement learning
dc.subject.lcshAlgorithms
dc.subject.lcshKernel functions
dc.titleOnline learning algorithms for reinforcement learning with function approximation
dc.typeThesis (PhD)en_AU
dcterms.accessRightsOpen Accessen_AU
local.contributor.affiliationAustralian National University. Research School of Computer Science.
local.description.notesThesis (Ph.D.)--Australian National Universityen_AU
local.identifier.doi10.25911/5d51538ad356e
local.mintdoimint
local.type.statusAccepted Versionen_AU

Downloads

abcd