Gradient-based Reinforcement Planning in Policy-Search Methods

Kwee, Ivo; Hutter, Marcus; Schmidhuber, Jürgen

Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Gradient-based Reinforcement Planning in Policy-Search Methods

Date

2001

Authors

Kwee, Ivo

Hutter, Marcus

Schmidhuber, Jürgen

Publisher

Utrecht University

Abstract

We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy {\em before} it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.

URI

http://hdl.handle.net/1885/15170

Collections

ANU Research Publications

Type

Conference paper

Book Title

Proceedings of the fifth European Workshop on Reinforcement Learning (EWRL-5)

Full item page

Cultural advice

Gradient-based Reinforcement Planning in Policy-Search Methods

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until