Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Experiments with Infinite-Horizon, Policy-Gradient Estimation

dc.contributor.authorBaxter, Jon
dc.contributor.authorBartlett, Peter
dc.contributor.authorWeaver, L
dc.date.accessioned2015-12-10T23:11:57Z
dc.date.issued2001
dc.date.updated2015-12-10T09:25:21Z
dc.description.abstractIn this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter β ∈ [0, 1), which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of Baxter and Bartlett (2001) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.
dc.identifier.issn1076-9757
dc.identifier.urihttp://hdl.handle.net/1885/63908
dc.publisherMorgan Kauffman Publishers
dc.sourceJournal of Artificial Intelligence Research
dc.titleExperiments with Infinite-Horizon, Policy-Gradient Estimation
dc.typeJournal article
local.bibliographicCitation.lastpage381
local.bibliographicCitation.startpage351
local.contributor.affiliationBaxter, Jon, College of Engineering and Computer Science, ANU
local.contributor.affiliationBartlett, Peter, College of Engineering and Computer Science, ANU
local.contributor.affiliationWeaver, L, College of Engineering and Computer Science, ANU
local.contributor.authoruidBaxter, Jon, u9612464
local.contributor.authoruidBartlett, Peter, u9301805
local.contributor.authoruidWeaver, L, u9405743
local.description.embargo2037-12-31
local.description.notesImported from ARIES
local.description.refereedYes
local.identifier.absfor010303 - Optimisation
local.identifier.ariespublicationMigratedxPub862
local.identifier.citationvolume15
local.identifier.scopusID2-s2.0-0013495368
local.type.statusPublished Version

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
01_Baxter_Experiments_with_2001.pdf
Size:
246.58 KB
Format:
Adobe Portable Document Format
abcd