Skip navigation
Skip navigation

Stochastic Optimisation of Controlled Partially Observable Markov Decision Processes

Bartlett, Peter; Baxter, Jon


We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β ∈ [0,1], which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In...[Show more]

CollectionsANU Research Publications
Date published: 2000
Type: Conference paper
Source: 39th IEEE Conference on Decision and Control


There are no files associated with this item.

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator