Skip navigation
Skip navigation

Stochastic Optimisation of Controlled Partially Observable Markov Decision Processes

Bartlett, Peter; Baxter, Jon

Description

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β ∈ [0,1], which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In...[Show more]

dc.contributor.authorBartlett, Peter
dc.contributor.authorBaxter, Jon
dc.coverage.spatialSydney Australia
dc.date.accessioned2015-12-07T22:55:30Z
dc.date.available2015-12-07T22:55:30Z
dc.date.createdDecember 12 2000
dc.identifier.isbn0780366417
dc.identifier.urihttp://hdl.handle.net/1885/28407
dc.description.abstractWe introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β ∈ [0,1], which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of β is related to the mixing time of the Markov chain induced by the POMDP.
dc.publisherCasual Productions
dc.relation.ispartofseriesIEEE Conference on Decision and Control 2000
dc.source39th IEEE Conference on Decision and Control
dc.subjectKeywords: Algorithms; Computer simulation; Convergence of numerical methods; Dynamic programming; Markov processes; Online systems; Optimization; Problem solving; Stochastic control systems; Partially observable Markov decision process; Stochastic optimization; Dec
dc.titleStochastic Optimisation of Controlled Partially Observable Markov Decision Processes
dc.typeConference paper
local.description.notesImported from ARIES
local.description.refereedYes
dc.date.issued2000
local.identifier.absfor010406 - Stochastic Analysis and Modelling
local.identifier.ariespublicationMigratedxPub58
local.type.statusPublished Version
local.contributor.affiliationBartlett, Peter, College of Engineering and Computer Science, ANU
local.contributor.affiliationBaxter, Jon, College of Engineering and Computer Science, ANU
local.bibliographicCitation.startpage124
local.bibliographicCitation.lastpage129
dc.date.updated2015-12-07T12:56:26Z
local.identifier.scopusID2-s2.0-0034439308
CollectionsANU Research Publications

Download

There are no files associated with this item.


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  23 August 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator