On Q-learning convergence for non-Markov decision processes
| dc.contributor.author | Majeed, Sultan | |
| dc.contributor.author | Hutter, Marcus | |
| dc.contributor.editor | Lang, Jerome | |
| dc.coverage.spatial | Stockholm, Sweden | |
| dc.date.accessioned | 2024-02-12T21:39:48Z | |
| dc.date.created | July 13-19 2018 | |
| dc.date.issued | 2018 | |
| dc.date.updated | 2022-10-02T07:19:28Z | |
| dc.description.abstract | Temporal-difference (TD) learning is an attractive, computationally efficient framework for model-free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states. | en_AU |
| dc.format.mimetype | application/pdf | en_AU |
| dc.identifier.isbn | 978-099924112-7 | en_AU |
| dc.identifier.uri | http://hdl.handle.net/1885/313408 | |
| dc.language.iso | en_AU | en_AU |
| dc.publisher | AAAI Press | en_AU |
| dc.relation.ispartofseries | 27th International Joint Conference on Artificial Intelligence, IJCAI 2018 | en_AU |
| dc.rights | © 2018 AAAI Press | en_AU |
| dc.source | IJCAI International Joint Conference on Artificial Intelligence | en_AU |
| dc.title | On Q-learning convergence for non-Markov decision processes | en_AU |
| dc.type | Conference paper | en_AU |
| dcterms.accessRights | Free Access via publisher website | en_AU |
| local.bibliographicCitation.lastpage | 2552 | en_AU |
| local.bibliographicCitation.startpage | 2546 | en_AU |
| local.contributor.affiliation | Majeed, Sultan, College of Engineering and Computer Science, ANU | en_AU |
| local.contributor.affiliation | Hutter, Marcus, College of Engineering and Computer Science, ANU | en_AU |
| local.contributor.authoruid | Majeed, Sultan, u5447242 | en_AU |
| local.contributor.authoruid | Hutter, Marcus, u4350841 | en_AU |
| local.description.embargo | 2099-12-31 | |
| local.description.notes | Imported from ARIES | en_AU |
| local.description.refereed | Yes | |
| local.identifier.absfor | 460306 - Image processing | en_AU |
| local.identifier.absfor | 460205 - Intelligent robotics | en_AU |
| local.identifier.ariespublication | u3102795xPUB1733 | en_AU |
| local.identifier.doi | 10.24963/ijcai.2018/353 | en_AU |
| local.identifier.scopusID | 2-s2.0-85055674171 | |
| local.publisher.url | https://www.ijcai.org/proceedings/2018/353 | en_AU |
| local.type.status | Published Version | en_AU |
Downloads
Original bundle
1 - 1 of 1