Near-optimal PAC bounds for discounted MDPs
-
Altmetric Citations
Lattimore, Tor; Hutter, Marcus
Description
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by...[Show more]
Collections | ANU Research Publications |
---|---|
Date published: | 2014 |
Type: | Journal article |
URI: | http://hdl.handle.net/1885/58388 |
Source: | Theoretical Computer Science |
DOI: | 10.1016/j.tcs.2014.09.029 |
Download
File | Description | Size | Format | Image |
---|---|---|---|---|
01_Lattimore_Near-optimal_PAC_bounds_for_2014.pdf | 468.63 kB | Adobe PDF | Request a copy |
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.
Updated: 17 November 2022/ Responsible Officer: University Librarian/ Page Contact: Library Systems & Web Coordinator