Gradient-based Reinforcement Planning in Policy-Search Methods

dc.contributor.author	Kwee, Ivo
dc.contributor.author	Hutter, Marcus
dc.contributor.author	Schmidhuber, Jürgen
dc.date.accessioned	2015-09-04T00:06:57Z
dc.date.available	2015-09-04T00:06:57Z
dc.date.issued	2001
dc.description.abstract	We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy {\em before} it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.	en_AU
dc.identifier.isbn	90-393-2874-9	en_AU
dc.identifier.issn	1389-5184	en_AU
dc.identifier.uri	http://hdl.handle.net/1885/15170
dc.publisher	Utrecht University	en_AU
dc.relation.ispartof	Proceedings of the fifth European Workshop on Reinforcement Learning (EWRL-5)	en_AU
dc.rights	© The Author(s)	en_AU
dc.title	Gradient-based Reinforcement Planning in Policy-Search Methods	en_AU
dc.type	Conference paper	en_AU
local.bibliographicCitation.lastpage	29	en_AU
local.bibliographicCitation.startpage	27	en_AU
local.contributor.affiliation	Hutter, M., Research School of Computer Science, The Australian National University	en_AU
local.contributor.authoruid	u4350841	en_AU
local.identifier.citationvolume	27	en_AU
local.type.status	Published Version	en_AU

Downloads

Now showing 1 - 1 of 1