Temporal Difference Updating without a Learning Rate
We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter α that specifies the learning rate. In the place of this free parameter there is now an equation for...[Show more]
|Collections||ANU Research Publications|
|Hutter and Legg Temporal Difference Updating 2007.pdf||183.71 kB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.