Skip navigation
Skip navigation

Gradient-based Reinforcement Planning in Policy-Search Methods

Kwee, Ivo; Hutter, Marcus; Schmidhuber, Jürgen


We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy {\em before} it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas ...[Show more]

CollectionsANU Research Publications
Date published: 2001
Type: Conference paper


There are no files associated with this item.

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator