Skip navigation
Skip navigation

Q-learning for history-based reinforcement learning

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus


We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in a natural manner by adding l0 regularisation to the pathwise squared Q-learning objective function and then optimise this over both a choice of map from history to states and the resulting MDP ...[Show more]

CollectionsANU Research Publications
Date published: 2013-11
Type: Conference paper


File Description SizeFormat Image
Daswani et al QLearning for history based 2013.pdf299.15 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator