Skip navigation
Skip navigation

Near-optimal PAC bounds for discounted MDPs

Lattimore, Tor; Hutter, Marcus


We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by...[Show more]

CollectionsANU Research Publications
Date published: 2014-09-18
Type: Journal article
Source: Theoretical Computer Science
DOI: 10.1016/j.tcs.2014.09.029


File Description SizeFormat Image
Lattimore and Hutter Near-optimal PAC Bounds 2014.pdf440.18 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator