Skip navigation
Skip navigation

General discounting versus average reward

Hutter, Marcus

Description

Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m→∞ and V for k→∞ are equal, provided both limits exist. Further, if...[Show more]

CollectionsANU Research Publications
Date published: 2006
Type: Conference paper
URI: http://hdl.handle.net/1885/15029
DOI: 10.1007/11894841_21

Download

File Description SizeFormat Image
Hutter General Discounting versus Average Reward 2006.pdf228.12 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  12 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator