Skip navigation
Skip navigation

Q-learning for history-based reinforcement learning

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus

Description

We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in a natural manner by adding l0 regularisation to the pathwise squared Q-learning objective function and then optimise this over both a choice of map from history to states and the resulting MDP ...[Show more]

CollectionsANU Research Publications
Date published: 2013-11
Type: Conference paper
URI: http://hdl.handle.net/1885/14711

Download

File Description SizeFormat Image
Daswani et al QLearning for history based 2013.pdf299.15 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator