Skip navigation
Skip navigation

Q-learning for history-based reinforcement learning

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus


We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in a natural manner by adding l0 regularisation to the pathwise squared Q-learning objective function and then optimise this over both a choice of map from history to states and the resulting MDP parameters. The optimisation procedure involves a stochastic search over the map class nested with classical...[Show more]

CollectionsANU Research Publications
Date published: 2013
Type: Journal article
Source: Journal of Machine Learning Research
Access Rights: Open Access


File Description SizeFormat Image
01_Daswani_Q-learning_for_history-based_2013.pdf299.15 kBAdobe PDF

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator