Constructing States for Reinforcement Learning

Mahmud, Hassan

A change is coming. Click to see a sneak peek of the new Open Research Repository.

Constructing States for Reinforcement Learning

Download (385.68 kB)

Description

POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by modeling the hidden environment with a novel class of models that are less expressive, but easier to learn and plan with than POMDPs. We call these models deterministic Markov models (DMMs), which are...[Show more] deterministic-probabilistic finite automata from learning theory, extended with actions to the sequential (rather than i.i.d.) setting. Conceptually, we extend the Utile Suffix Memory method of McCal-lum to handle long term memory. We describe DMMs, give Bayesian algorithms for learning and planning with them and also present experimental results for some standard POMDP tasks and tasks to illustrate its efficacy.

dc.contributor.author	Mahmud, Hassan
dc.coverage.spatial	Haifa Israel
dc.date.accessioned	2015-12-08T22:30:18Z
dc.date.created	June 21 2010
dc.identifier.isbn	9781605589077
dc.identifier.uri	http://hdl.handle.net/1885/34380
dc.description.abstract	POMDPs are the models of choice for reinforcement learning (RL) tasks where the environment cannot be observed directly. In many applications we need to learn the POMDP structure and parameters from experience and this is considered to be a difficult problem. In this paper we address this issue by modeling the hidden environment with a novel class of models that are less expressive, but easier to learn and plan with than POMDPs. We call these models deterministic Markov models (DMMs), which are deterministic-probabilistic finite automata from learning theory, extended with actions to the sequential (rather than i.i.d.) setting. Conceptually, we extend the Utile Suffix Memory method of McCal-lum to handle long term memory. We describe DMMs, give Bayesian algorithms for learning and planning with them and also present experimental results for some standard POMDP tasks and tasks to illustrate its efficacy.
dc.publisher	OmniPress
dc.relation.ispartofseries	International Conference on Machine Learning (ICML 2010)
dc.source	Proceedings of International Conference on Machine Learning (ICML 2010)
dc.subject	Keywords: Bayesian algorithms; Learning Theory; Long term memory; Markov model; Probabilistic finite automata; Markov processes; Reinforcement learning; Automata theory
dc.title	Constructing States for Reinforcement Learning
dc.type	Conference paper
local.description.notes	Imported from ARIES
local.description.refereed	Yes
dc.date.issued	2010
local.identifier.absfor	089999 - Information and Computing Sciences not elsewhere classified
local.identifier.ariespublication	u4963866xPUB112
local.type.status	Published Version
local.contributor.affiliation	Mahmud, Hassan, College of Engineering and Computer Science, ANU
local.description.embargo	2037-12-31
local.bibliographicCitation.startpage	8
local.identifier.absseo	970108 - Expanding Knowledge in the Information and Computing Sciences
dc.date.updated	2016-02-24T11:29:47Z
local.identifier.scopusID	2-s2.0-77956529192
Collections	ANU Research Publications

Download

File	Description	Size	Format	Image
01_Mahmud_Constructing_States_for_2010.pdf		385.68 kB	Adobe PDF

Show simple item record