Temporal difference updating without a learning rate

Hutter, Marcus; Legg, Shane

Temporal difference updating without a learning rate

Date

2008

Authors

Hutter, Marcus

Legg, Shane

Publisher

MIT Press

Abstract

We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter a that specifies the learning rate. In the place of this free parameter there is now an equation for the learning rate that is specific to each state transition. We experimentally test this new learning rule against TD(λ) and find that it offers superior performance in various settings. Finally, we make some preliminary investigations into how to extend our new temporal difference algorithm to reinforcement learning. To do this we combine our update equation with both Watkins' Q(λ) and Sarsa(λ) and find that it again offers superior performance without a learning rate parameter.

Keywords

Keywords: Eligibility traces; Free parameters; Learning rates; Learning rules; State transitions; Statistical principles; Temporal difference learning; Temporal differences; Temporal-difference algorithm; Variational principles; Reinforcement learning; Variational

URI

http://hdl.handle.net/1885/52259

Collections

ANU Research Publications

Source

Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference

Type

Conference paper

Restricted until

2037-12-31

Downloads

File

Description

01_Hutter_Temporal_difference_updating_2008.pdf (283.27 KB)

Full item page

Temporal difference updating without a learning rate

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads