Skip navigation
Skip navigation

Generic Reinforcement Learning Beyond Small MDPs

Daswani, Mayank

Description

Feature reinforcement learning (FRL) is a framework within which an agent can automatically reduce a complex environment to a Markov Decision Process (MDP) by finding a map which aggregates similar histories into the states of an MDP. The primary motivation behind this thesis is to build FRL agents that work in practice, both for larger environments and larger classes of environments. We focus on empirical work targeted at practitioners in the field...[Show more]

dc.contributor.authorDaswani, Mayank
dc.date.accessioned2016-11-24T00:49:44Z
dc.date.available2016-11-24T00:49:44Z
dc.identifier.otherb40393902
dc.identifier.urihttp://hdl.handle.net/1885/110545
dc.description.abstractFeature reinforcement learning (FRL) is a framework within which an agent can automatically reduce a complex environment to a Markov Decision Process (MDP) by finding a map which aggregates similar histories into the states of an MDP. The primary motivation behind this thesis is to build FRL agents that work in practice, both for larger environments and larger classes of environments. We focus on empirical work targeted at practitioners in the field of general reinforcement learning, with theoretical results wherever necessary. The current state-of-the-art in FRL uses suffix trees which have issues with large observation spaces and long-term dependencies. We start by addressing the issue of long-term dependency using a class of maps known as looping suffix trees, which have previously been used to represent deterministic POMDPs. We show the best existing results on the TMaze domain and good results on larger domains that require long-term memory. We introduce a new value-based cost function that can be evaluated model-free. The value- based cost allows for smaller representations, and its model-free nature allows for its extension to the function approximation setting, which has computational and representational advantages for large state spaces. We evaluate the performance of this new cost in both the tabular and function approximation settings on a variety of domains, and show performance better than the state-of-the-art algorithm MC-AIXI-CTW on the domain POCMAN. When the environment is very large, an FRL agent needs to explore systematically in order to find a good representation. However, it needs a good representation in order to perform this systematic exploration. We decouple both by considering a different setting, one where the agent has access to the value of any state-action pair from an oracle in a training phase. The agent must learn an approximate representation of the optimal value function. We formulate a regression-based solution based on online learning methods to build an such an agent. We test this agent on the Arcade Learning Environment using a simple class of linear function approximators. While we made progress on the issue of scalability, two major issues with the FRL framework remain: the need for a stochastic search method to minimise the objective function and the need to store an uncompressed history, both of which can be very computationally demanding.
dc.language.isoen
dc.subjectreinforcement learning
dc.subjectartificial intelligence
dc.subjectAGI
dc.subjectMarkov Decision Processes
dc.subjectMDP
dc.subjectfunction approximation
dc.subjectsuffix trees
dc.subjectlooping suffix trees
dc.subjectgeneral learning agents
dc.subjectpartially observable
dc.subjectPOMDP
dc.subjectimitation learning
dc.subjectDAgger
dc.subjectAtari games
dc.subjectArcade Learning Environment
dc.titleGeneric Reinforcement Learning Beyond Small MDPs
dc.typeThesis (PhD)
local.contributor.supervisorHutter, Marcus
local.contributor.supervisorcontactmarcus.hutter@anu.edu.au
dcterms.valid2016
local.description.notesauthor deposited 24/11/16
local.type.degreeDoctor of Philosophy (PhD)
dc.date.issued2015
local.contributor.affiliationResearch School of Computer Science, The Australian National University
local.identifier.doi10.25911/5d7637291a901
local.mintdoimint
CollectionsOpen Access Theses

Download

File Description SizeFormat Image
Daswani Thesis 2016.pdf1.37 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator