Q-learning for history-based reinforcement learning
Daswani, Mayank; Sunehag, Peter; Hutter, Marcus
Description
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in a natural manner by adding l0 regularisation to the pathwise squared Q-learning objective function and then optimise this over both a choice of map from history to states and the resulting MDP ...[Show more]
Collections | ANU Research Publications |
---|---|
Date published: | 2013-11 |
Type: | Conference paper |
URI: | http://hdl.handle.net/1885/14711 |
Book Title: | JMLR Workshop and Conference Proceedings: Volume 29: Asian Conference on Machine Learning |
Download
File | Description | Size | Format | Image |
---|---|---|---|---|
Daswani et al QLearning for history based 2013.pdf | 299.15 kB | Adobe PDF | ![]() |
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.
Updated: 19 May 2020/ Responsible Officer: University Librarian/ Page Contact: Library Systems & Web Coordinator