Gradient-based Reinforcement Planning in Policy-Search Methods
| dc.contributor.author | Kwee, Ivo | |
| dc.contributor.author | Hutter, Marcus | |
| dc.contributor.author | Schmidhuber, Jürgen | |
| dc.date.accessioned | 2015-09-04T00:06:57Z | |
| dc.date.available | 2015-09-04T00:06:57Z | |
| dc.date.issued | 2001 | |
| dc.description.abstract | We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy {\em before} it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments. | en_AU |
| dc.identifier.isbn | 90-393-2874-9 | en_AU |
| dc.identifier.issn | 1389-5184 | en_AU |
| dc.identifier.uri | http://hdl.handle.net/1885/15170 | |
| dc.publisher | Utrecht University | en_AU |
| dc.relation.ispartof | Proceedings of the fifth European Workshop on Reinforcement Learning (EWRL-5) | en_AU |
| dc.rights | © The Author(s) | en_AU |
| dc.title | Gradient-based Reinforcement Planning in Policy-Search Methods | en_AU |
| dc.type | Conference paper | en_AU |
| local.bibliographicCitation.lastpage | 29 | en_AU |
| local.bibliographicCitation.startpage | 27 | en_AU |
| local.contributor.affiliation | Hutter, M., Research School of Computer Science, The Australian National University | en_AU |
| local.contributor.authoruid | u4350841 | en_AU |
| local.identifier.citationvolume | 27 | en_AU |
| local.type.status | Published Version | en_AU |
Downloads
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 884 B
- Format:
- Item-specific license agreed upon to submission
- Description: