Near-optimal PAC bounds for discounted MDPs
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by...[Show more]
|Collections||ANU Research Publications|
|Source:||Theoretical Computer Science|
|01_Lattimore_Near-optimal_PAC_bounds_for_2014.pdf||468.63 kB||Adobe PDF||Request a copy|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.