Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Context tree maximizing reinforcement learning

Loading...
Thumbnail Image

Authors

Nguyen, Phuong
Sunehag, Peter
Hutter, Marcus

Journal Title

Journal ISSN

Volume Title

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

Recent developments in reinforcement learning for nonMarkovian problems witness a surge in history-based methods, among which we are particularly interested in two frameworks, ΦMDP and MC-AIXI-CTW. ΦMDP attempts to reduce the general RL problem, where the environment’s states and dynamics are both unknown, to an MDP, while MCAIXI-CTW incrementally learns a mixture of context trees as its environment model. The main idea of ΦMDP is to connect generic reinforcement learning with classical reinforcement learning. The first implementation of ΦMDP relies on a stochastic search procedure for finding a tree that minimizes a certain cost function. This does not guarantee finding the minimizing tree, or even a good one, given limited search time. As a consequence it appears that the approach has difficulties with large domains. MC-AIXI-CTW is attractive in that it can incrementally and analytically compute the internal model through interactions with the environment. Unfortunately, it is computationally demanding due to requiring heavy planning simulations at every single time step. We devise a novel approach called CTMRL, which analytically and efficiently finds the cost-minimizing tree. Instead of the context-tree weighting method that MC-AIXI-CTW is based on, we use the closely related context-tree maximizing algorithm that selects just one single tree. This approach falls under the ΦMDP framework, which allows the replacement of the costly planning component of MC-AIXI-CTW with simple Q-Learning. Our empirical investigation shows that CTMRL finds policies of quality as good as MC-AIXI-CTW’s on six domains including a challenging Pacman domain, but in an order of magnitude less time.

Description

Citation

Source

Book Title

Proceedings of the 26th AAAI Conference on Artificial Intelligence

Entity type

Access Statement

License Rights

DOI

Restricted until

abcd