Policy Gradient Methods: Variance Reduction and Stochastic Convergence

Greensmith, Evan

Policy Gradient Methods: Variance Reduction and Stochastic Convergence

dc.contributor.author	Greensmith, Evan	en_US
dc.date.accessioned	2008-06-16T06:35:31Z	en_US
dc.date.accessioned	2011-01-04T02:38:37Z
dc.date.available	2008-06-16T06:35:31Z	en_US
dc.date.available	2011-01-04T02:38:37Z
dc.date.issued	2005
dc.description.abstract	In a reinforcement learning task an agent must learn a policy for performing actions so as to perform well in a given environment. Policy gradient methods consider a parameterized class of policies, and using a policy from the class, and a trajectory through the environment taken by the agent using this policy, estimate the performance of the policy with respect to the parameters. Policy gradient methods avoid some of the problems of value function methods, such as policy degradation, where inaccuracy in the value function leads to the choice of a poor policy. However, the estimates produced by policy gradient methods can have high variance. ¶ ...	en_US
dc.identifier.other	b2247206x
dc.identifier.uri	http://hdl.handle.net/1885/47105
dc.language.iso	en	en_US
dc.rights.uri	The Australian National University	en_US
dc.subject	reinforcement learning	en_US
dc.subject	policy gradient	en_US
dc.subject	stochastic convergence	en_US
dc.subject	variance reduction	en_US
dc.title	Policy Gradient Methods: Variance Reduction and Stochastic Convergence	en_US
dc.type	Thesis (PhD)	en_US
dcterms.valid	2005	en_US
local.contributor.affiliation	Research School of Information Sciences and Engineering, Computer Sciences Laboratory	en_US
local.contributor.affiliation	The Australian National University	en_US
local.description.refereed	yes	en_US
local.identifier.doi	10.25911/5d7a2a01dcebe
local.mintdoi	mint
local.type.degree	Doctor of Philosophy (PhD)	en_US