On Q-learning convergence for non-Markov decision processes

dc.contributor.authorMajeed, Sultan
dc.contributor.authorHutter, Marcus
dc.contributor.editorLang, Jerome
dc.coverage.spatialStockholm, Sweden
dc.date.accessioned2024-02-12T21:39:48Z
dc.date.createdJuly 13-19 2018
dc.date.issued2018
dc.date.updated2022-10-02T07:19:28Z
dc.description.abstractTemporal-difference (TD) learning is an attractive, computationally efficient framework for model-free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states.en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.isbn978-099924112-7en_AU
dc.identifier.urihttp://hdl.handle.net/1885/313408
dc.language.isoen_AUen_AU
dc.publisherAAAI Pressen_AU
dc.relation.ispartofseries27th International Joint Conference on Artificial Intelligence, IJCAI 2018en_AU
dc.rights© 2018 AAAI Pressen_AU
dc.sourceIJCAI International Joint Conference on Artificial Intelligenceen_AU
dc.titleOn Q-learning convergence for non-Markov decision processesen_AU
dc.typeConference paperen_AU
dcterms.accessRightsFree Access via publisher websiteen_AU
local.bibliographicCitation.lastpage2552en_AU
local.bibliographicCitation.startpage2546en_AU
local.contributor.affiliationMajeed, Sultan, College of Engineering and Computer Science, ANUen_AU
local.contributor.affiliationHutter, Marcus, College of Engineering and Computer Science, ANUen_AU
local.contributor.authoruidMajeed, Sultan, u5447242en_AU
local.contributor.authoruidHutter, Marcus, u4350841en_AU
local.description.embargo2099-12-31
local.description.notesImported from ARIESen_AU
local.description.refereedYes
local.identifier.absfor460306 - Image processingen_AU
local.identifier.absfor460205 - Intelligent roboticsen_AU
local.identifier.ariespublicationu3102795xPUB1733en_AU
local.identifier.doi10.24963/ijcai.2018/353en_AU
local.identifier.scopusID2-s2.0-85055674171
local.publisher.urlhttps://www.ijcai.org/proceedings/2018/353en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0353.pdf
Size:
266.28 KB
Format:
Adobe Portable Document Format
Description: