Generic Reinforcement Learning Beyond Small MDPs
Date
2015
Authors
Daswani, Mayank
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Feature reinforcement learning (FRL) is a framework within which
an agent can automatically
reduce a complex environment to a Markov Decision Process (MDP)
by finding a map which
aggregates similar histories into the states of an MDP. The
primary motivation behind this
thesis is to build FRL agents that work in practice, both for
larger environments and larger
classes of environments. We focus on empirical work targeted at
practitioners in the field of
general reinforcement learning, with theoretical results wherever
necessary.
The current state-of-the-art in FRL uses suffix trees which have
issues with large observation
spaces and long-term dependencies. We start by addressing the
issue of long-term dependency
using a class of maps known as looping suffix trees, which have
previously been used to
represent deterministic POMDPs. We show the best existing results
on the TMaze domain
and good results on larger domains that require long-term
memory.
We introduce a new value-based cost function that can be
evaluated model-free. The value-
based cost allows for smaller representations, and its model-free
nature allows for its extension
to the function approximation setting, which has computational
and representational advantages for large state spaces. We
evaluate the performance of this new cost in both the tabular and
function approximation settings on a variety of domains, and show
performance better than the state-of-the-art algorithm
MC-AIXI-CTW on the domain POCMAN.
When the environment is very large, an FRL agent needs to explore
systematically in order to
find a good representation. However, it needs a good
representation in order to perform this
systematic exploration. We decouple both by considering a
different setting, one where the
agent has access to the value of any state-action pair from an
oracle in a training phase. The
agent must learn an approximate representation of the optimal
value function. We formulate
a regression-based solution based on online learning methods to
build an such an agent. We
test this agent on the Arcade Learning Environment using a simple
class of linear function
approximators.
While we made progress on the issue of scalability, two major
issues with the FRL framework
remain: the need for a stochastic search method to minimise the
objective function and the
need to store an uncompressed history, both of which can be very
computationally demanding.
Description
Keywords
reinforcement learning, artificial intelligence, AGI, Markov Decision Processes, MDP, function approximation, suffix trees, looping suffix trees, general learning agents, partially observable, POMDP, imitation learning, DAgger, Atari games, Arcade Learning Environment
Citation
Collections
Source
Type
Thesis (PhD)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description