Hutter, Marcus; Legg, Shane
We derive an equation for temporal difference learning from statistical principles.
Specifically, we start with the variational principle and then bootstrap to produce
an updating rule for discounted state value estimates. The resulting equation is
similar to the standard equation for temporal difference learning with eligibility
traces, so called TD(λ), however it lacks the parameter α that specifies the
learning rate. In the place of this free parameter there is now an equation for...[Show more]
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.