Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Gradient-based Reinforcement Planning in Policy-Search Methods

Loading...
Thumbnail Image

Date

Authors

Kwee, Ivo
Hutter, Marcus
Schmidhuber, Jürgen

Journal Title

Journal ISSN

Volume Title

Publisher

Utrecht University

Abstract

We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy {\em before} it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.

Description

Keywords

Citation

Source

Book Title

Proceedings of the fifth European Workshop on Reinforcement Learning (EWRL-5)

Entity type

Access Statement

License Rights

DOI

Restricted until

abcd