Open Research is currently re-indexing its items due to scheduled maintenance on Saturday 14th March 2026. As such not all items in the collection may be searchable at this time.

On Q-learning convergence for non-Markov decision processes

Loading...
Thumbnail Image

Date

Authors

Majeed, Sultan
Hutter, Marcus

Journal Title

Journal ISSN

Volume Title

Publisher

AAAI Press

Abstract

Temporal-difference (TD) learning is an attractive, computationally efficient framework for model-free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states.

Description

Keywords

Citation

Source

IJCAI International Joint Conference on Artificial Intelligence

Book Title

Entity type

Access Statement

Free Access via publisher website

License Rights

Restricted until

2099-12-31

Downloads

File
Description