Skip navigation
Skip navigation

Feature reinforcement learning using looping suffix trees

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus

Description

There has recently been much interest in history-based methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have long-term dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which have previously only been used in solving deterministic POMDPs. The resulting algorithm replicates results from CTΦMDP for environments with short term dependencies, while it outperforms LSTM-based...[Show more]

dc.contributor.authorDaswani, Mayank
dc.contributor.authorSunehag, Peter
dc.contributor.authorHutter, Marcus
dc.date.accessioned2015-08-14T05:24:58Z
dc.date.available2015-08-14T05:24:58Z
dc.identifier.issn1532-4435
dc.identifier.urihttp://hdl.handle.net/1885/14724
dc.description.abstractThere has recently been much interest in history-based methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have long-term dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which have previously only been used in solving deterministic POMDPs. The resulting algorithm replicates results from CTΦMDP for environments with short term dependencies, while it outperforms LSTM-based methods on TMaze, a deep memory environment.
dc.publisherJournal of Machine Learning Research
dc.relation.ispartof10th European Workshop on Reinforcement Learning: JMLR: Workshop and Conference Proceedings 24
dc.rights© 2012 M. Daswani, P. Sunehag & M. Hutter. Author can archive publisher’s version/PDF. http://www.sherpa.ac.uk/romeo/issn/1532-4435/ as at 14/8/15
dc.subjectlooping suffix trees
dc.subjectMarkov decision process
dc.subjectreinforcement learning
dc.subjectpartial observability
dc.subjectMonte Carlo search
dc.subjectrational agents
dc.titleFeature reinforcement learning using looping suffix trees
dc.typeConference paper
dc.date.issued2012-12
local.type.statusPublished Version
local.contributor.affiliationHutter, M., Research School of Computer Science, The Australian National University
dc.relationhttp://purl.org/au-research/grants/arc/DP120100950
local.bibliographicCitation.startpage11
local.bibliographicCitation.lastpage23
CollectionsANU Research Publications

Download

File Description SizeFormat Image
Daswani et al Feature Reinforcement Learning 2012.pdf301.79 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator