Skip navigation
Skip navigation

Stochastic Optimisation of Controlled Partially Observable Markov Decision Processes

Bartlett, Peter; Baxter, Jon

Description

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β ∈ [0,1], which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In...[Show more]

CollectionsANU Research Publications
Date published: 2000
Type: Conference paper
URI: http://hdl.handle.net/1885/28407
Source: 39th IEEE Conference on Decision and Control

Download

There are no files associated with this item.


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator