Gradient-based Reinforcement Planning in Policy-Search Methods
Loading...
Date
Authors
Kwee, Ivo
Hutter, Marcus
Schmidhuber, Jürgen
Journal Title
Journal ISSN
Volume Title
Publisher
Utrecht University
Abstract
We introduce a learning method called "gradient-based reinforcement
planning" (GREP). Unlike traditional DP methods that improve their
policy backwards in time, GREP is a gradient-based method that plans
ahead and improves its policy {\em before} it actually acts in the
environment. We derive formulas for the exact policy gradient that
maximizes the expected future reward and confirm our ideas
with numerical experiments.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Proceedings of the fifth European Workshop on Reinforcement Learning (EWRL-5)