Skip navigation
Skip navigation

General discounting versus average reward

Hutter, Marcus


Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (d

CollectionsANU Research Publications
Date published: 2006
Type: Conference paper
Source: Proceedings of International Conference on Algorithmic Learning Theory (ALT 2006)


File Description SizeFormat Image
01_Hutter_General_discounting_versus_2006.pdf130.88 kBAdobe PDF    Request a copy
02_Hutter_General_discounting_versus_2006.pdf160.2 kBAdobe PDF    Request a copy
03_Hutter_General_discounting_versus_2006.pdf399.42 kBAdobe PDF    Request a copy

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator