Skip navigation
Skip navigation

Temporal Difference Updating without a Learning Rate

Hutter, Marcus; Legg, Shane


We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter α that specifies the learning rate. In the place of this free parameter there is now an equation for...[Show more]

CollectionsANU Research Publications
Date published: 2007-12
Type: Conference paper
Book Title: Advances in Neural Information Processing Systems 20


File Description SizeFormat Image
Hutter and Legg Temporal Difference Updating 2007.pdf183.71 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator