Skip navigation
Skip navigation

General discounting versus average reward

Hutter, Marcus


Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m→∞ and V for k→∞ are equal, provided both limits exist. Further, if...[Show more]

CollectionsANU Research Publications
Date published: 2006
Type: Conference paper
DOI: 10.1007/11894841_21


File Description SizeFormat Image
Hutter General Discounting versus Average Reward 2006.pdf228.12 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator