A Monte-Carlo AIXI Approximation

dc.contributor.authorVeness, Joel
dc.contributor.authorNg, Kee Siong
dc.contributor.authorHutter, Marcus
dc.contributor.authorUther, William
dc.contributor.authorSilver, David
dc.date.accessioned2015-08-24T06:29:36Z
dc.date.available2015-08-24T06:29:36Z
dc.date.issued2011-01
dc.description.abstractThis paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.en_AU
dc.identifier.issn1076-9757en_AU
dc.identifier.urihttp://hdl.handle.net/1885/14906
dc.provenancehttps://v2.sherpa.ac.uk/id/publication/32771/..."published version can be archived in institutional repository" from SHERPA/RoMEO site as at 14/05/2024
dc.publisherAAAI Pressen_AU
dc.relationhttp://purl.org/au-research/grants/arc/DP0988049en_AU
dc.sourceJournal of Artificial Intelligence Researchen_AU
dc.subjectReinforcement Learning (RL)en_AU
dc.subjectContext Tree Weighting (CTW)en_AU
dc.subjectMonte Carlo Tree Search (MCTS)en_AU
dc.subjectUpper Confidence bounds applied to Trees (UCT)en_AU
dc.subjectPartially Observable Markov Decision Process (POMDP)en_AU
dc.subjectPrediction Suffix Trees (PST)en_AU
dc.titleA Monte-Carlo AIXI Approximationen_AU
dc.typeJournal articleen_AU
dcterms.accessRightsOpen Access
local.bibliographicCitation.lastpage142en_AU
local.bibliographicCitation.startpage95en_AU
local.contributor.affiliationNg, K. S., Research School of Computer Science, The Australian National Universityen_AU
local.contributor.affiliationHutter, M., Research School of Computer Science, The Australian National Universityen_AU
local.contributor.authoremailkeesiong.ng@gmail.comen_AU
local.contributor.authoremailmarcus.hutter@anu.edu.auen_AU
local.contributor.authoruidu4350841en_AU
local.identifier.citationvolume40en_AU
local.identifier.doi10.1613/jair.3125en_AU
local.identifier.uidSubmittedByu1005913en_AU
local.publisher.urlhttp://jair.org/en_AU
local.type.statusPublished Versionen_AU

Downloads