Temporal difference updating without a learning rate
We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter a that specifies the learning rate. In the place of this free parameter there is now an equation for the...[Show more]
|Collections||ANU Research Publications|
|Source:||Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference|
|01_Hutter_Temporal_difference_updating_2008.pdf||283.27 kB||Adobe PDF||Request a copy|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.