On Q-learning convergence for non-Markov decision processes

Majeed, Sultan; Hutter, Marcus

On Q-learning convergence for non-Markov decision processes

dc.contributor.author	Majeed, Sultan
dc.contributor.author	Hutter, Marcus
dc.contributor.editor	Lang, Jerome
dc.coverage.spatial	Stockholm, Sweden
dc.date.accessioned	2024-02-12T21:39:48Z
dc.date.created	July 13-19 2018
dc.date.issued	2018
dc.date.updated	2022-10-02T07:19:28Z
dc.description.abstract	Temporal-difference (TD) learning is an attractive, computationally efficient framework for model-free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states.	en_AU
dc.format.mimetype	application/pdf	en_AU
dc.identifier.isbn	978-099924112-7	en_AU
dc.identifier.uri	http://hdl.handle.net/1885/313408
dc.language.iso	en_AU	en_AU
dc.publisher	AAAI Press	en_AU
dc.relation.ispartofseries	27th International Joint Conference on Artificial Intelligence, IJCAI 2018	en_AU
dc.rights	© 2018 AAAI Press	en_AU
dc.source	IJCAI International Joint Conference on Artificial Intelligence	en_AU
dc.title	On Q-learning convergence for non-Markov decision processes	en_AU
dc.type	Conference paper	en_AU
dcterms.accessRights	Free Access via publisher website	en_AU
local.bibliographicCitation.lastpage	2552	en_AU
local.bibliographicCitation.startpage	2546	en_AU
local.contributor.affiliation	Majeed, Sultan, College of Engineering and Computer Science, ANU	en_AU
local.contributor.affiliation	Hutter, Marcus, College of Engineering and Computer Science, ANU	en_AU
local.contributor.authoruid	Majeed, Sultan, u5447242	en_AU
local.contributor.authoruid	Hutter, Marcus, u4350841	en_AU
local.description.embargo	2099-12-31
local.description.notes	Imported from ARIES	en_AU
local.description.refereed	Yes
local.identifier.absfor	460306 - Image processing	en_AU
local.identifier.absfor	460205 - Intelligent robotics	en_AU
local.identifier.ariespublication	u3102795xPUB1733	en_AU
local.identifier.doi	10.24963/ijcai.2018/353	en_AU
local.identifier.scopusID	2-s2.0-85055674171
local.publisher.url	https://www.ijcai.org/proceedings/2018/353	en_AU
local.type.status	Published Version	en_AU

Downloads

Original bundle

Now showing 1 - 1 of 1

Name:: 0353.pdf
Size:: 266.28 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

ANU Research Publications