Skip navigation
Skip navigation

Temporal Difference Updating without a Learning Rate

Hutter, Marcus; Legg, Shane

Description

We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter α that specifies the learning rate. In the place of this free parameter there is now an equation for...[Show more]

CollectionsANU Research Publications
Date published: 2007-12
Type: Conference paper
URI: http://hdl.handle.net/1885/14999

Download

File Description SizeFormat Image
Hutter and Legg Temporal Difference Updating 2007.pdf183.71 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator