Skip navigation
Skip navigation

Temporal difference updating without a learning rate

Hutter, Marcus; Legg, Shane

Description

We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter a that specifies the learning rate. In the place of this free parameter there is now an equation for the...[Show more]

CollectionsANU Research Publications
Date published: 2008
Type: Conference paper
URI: http://hdl.handle.net/1885/52259
Source: Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference

Download

File Description SizeFormat Image
01_Hutter_Temporal_difference_updating_2008.pdf283.27 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator