Skip navigation
Skip navigation

Reinforcement learning with value advice

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus

Description

The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the ...[Show more]

dc.contributor.authorDaswani, Mayank
dc.contributor.authorSunehag, Peter
dc.contributor.authorHutter, Marcus
dc.date.accessioned2015-08-12T06:16:20Z
dc.date.available2015-08-12T06:16:20Z
dc.identifier.urihttp://hdl.handle.net/1885/14702
dc.description.abstractThe problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.
dc.publisherJournal of Machine Learning Research
dc.relation.ispartofProceedings of the 6th Asian Conference on Machine Learning
dc.rights© 2014 M. Daswani, P. Sunehag & M. Hutter. . Author can archive publisher’s version/PDF. http://www.sherpa.ac.uk/romeo/issn/1532-4435/ from SHERPA/RoMEO site (as at 12/08/15)
dc.source.urihttp://jmlr.org/proceedings/papers/v39/daswani14.pdf
dc.subjectfeature reinforcement learning
dc.subjectimitation learning
dc.subjectdataset aggregation
dc.subjectvalue advice
dc.subjectupper confidence tree
dc.subjectMonte Carlo search
dc.subjectArcade learning environment
dc.titleReinforcement learning with value advice
dc.typeConference paper
dc.date.issued2014-11
local.publisher.urlhttp://jmlr.org/
local.type.statusPublished Version
local.contributor.affiliationHutter, M., Research School of Computer Science, The Australian National University
dc.relationhttp://purl.org/au-research/grants/arc/DP120100950
local.bibliographicCitation.startpage299
local.bibliographicCitation.lastpage314
CollectionsANU Research Publications

Download

File Description SizeFormat Image
Daswani et al Reinforcement Learning 2014.pdf333.02 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator