Reinforcement learning with value advice
Date
2014-11
Authors
Daswani, Mayank
Sunehag, Peter
Hutter, Marcus
Journal Title
Journal ISSN
Volume Title
Publisher
Journal of Machine Learning Research
Abstract
The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.
Description
Keywords
feature reinforcement learning, imitation learning, dataset aggregation, value advice, upper confidence tree, Monte Carlo search, Arcade learning environment
Citation
Collections
Source
Type
Conference paper
Book Title
Proceedings of the 6th Asian Conference on Machine Learning
Entity type
Access Statement
Open Access
License Rights
Creative Commons Attribution licence
DOI
Restricted until
Downloads
File
Description