Weaver, LTao, Nigel2015-12-102015-12-10August 2 21558608001http://hdl.handle.net/1885/63665The optimal Reward Baseline for Gradient-Based Reinforcement Learning20012015-12-10