Skip navigation
Skip navigation

General discounting versus average reward

Hutter, Marcus

Description

Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (d

CollectionsANU Research Publications
Date published: 2006
Type: Conference paper
URI: http://hdl.handle.net/1885/28236
Source: Proceedings of International Conference on Algorithmic Learning Theory (ALT 2006)

Download

File Description SizeFormat Image
01_Hutter_General_discounting_versus_2006.pdf130.88 kBAdobe PDF    Request a copy
02_Hutter_General_discounting_versus_2006.pdf160.2 kBAdobe PDF    Request a copy
03_Hutter_General_discounting_versus_2006.pdf399.42 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator