Skip navigation
Skip navigation

Reinforcement learning with value advice

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus

Description

The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the ...[Show more]

CollectionsANU Research Publications
Date published: 2014-11
Type: Conference paper
URI: http://hdl.handle.net/1885/14702

Download

File Description SizeFormat Image
Daswani et al Reinforcement Learning 2014.pdf333.02 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator