General discounting versus average reward
Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (d
|01_Hutter_General_discounting_versus_2006.pdf||130.88 kB||Adobe PDF|| Request a copy|
|02_Hutter_General_discounting_versus_2006.pdf||160.2 kB||Adobe PDF|| Request a copy|
|03_Hutter_General_discounting_versus_2006.pdf||399.42 kB||Adobe PDF|| Request a copy|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.