Policy Gradient Methods: Variance Reduction and Stochastic Convergence
| dc.contributor.author | Greensmith, Evan | en_US |
| dc.date.accessioned | 2008-06-16T06:35:31Z | en_US |
| dc.date.accessioned | 2011-01-04T02:38:37Z | |
| dc.date.available | 2008-06-16T06:35:31Z | en_US |
| dc.date.available | 2011-01-04T02:38:37Z | |
| dc.date.issued | 2005 | |
| dc.description.abstract | In a reinforcement learning task an agent must learn a policy for performing actions so as to perform well in a given environment. Policy gradient methods consider a parameterized class of policies, and using a policy from the class, and a trajectory through the environment taken by the agent using this policy, estimate the performance of the policy with respect to the parameters. Policy gradient methods avoid some of the problems of value function methods, such as policy degradation, where inaccuracy in the value function leads to the choice of a poor policy. However, the estimates produced by policy gradient methods can have high variance. ¶ ... | en_US |
| dc.identifier.other | b2247206x | |
| dc.identifier.uri | http://hdl.handle.net/1885/47105 | |
| dc.language.iso | en | en_US |
| dc.rights.uri | The Australian National University | en_US |
| dc.subject | reinforcement learning | en_US |
| dc.subject | policy gradient | en_US |
| dc.subject | stochastic convergence | en_US |
| dc.subject | variance reduction | en_US |
| dc.title | Policy Gradient Methods: Variance Reduction and Stochastic Convergence | en_US |
| dc.type | Thesis (PhD) | en_US |
| dcterms.valid | 2005 | en_US |
| local.contributor.affiliation | Research School of Information Sciences and Engineering, Computer Sciences Laboratory | en_US |
| local.contributor.affiliation | The Australian National University | en_US |
| local.description.refereed | yes | en_US |
| local.identifier.doi | 10.25911/5d7a2a01dcebe | |
| local.mintdoi | mint | |
| local.type.degree | Doctor of Philosophy (PhD) | en_US |