General-purpose, intelligent, learning agents cycle through sequences of
observations, actions, and rewards that are complex, uncertain, unknown, and
non-Markovian. On the other hand, reinforcement learning is well-developed
for small finite state Markov decision processes (MDPs). Up to now, extracting
the right state representations out of bare observations, that is, reducing
the general agent setup to the MDP framework, is an art that involves significant
effort by designers. The...[Show more]
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.