Q-learning for history-based reinforcement learning
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in a natural manner by adding l0 regularisation to the pathwise squared Q-learning objective function and then optimise this over both a choice of map from history to states and the resulting MDP ...[Show more]
|Collections||ANU Research Publications|
|Daswani et al QLearning for history based 2013.pdf||299.15 kB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.