Skip navigation
Skip navigation

Avoiding wireheading with value reinforcement learning

Everitt, Tom; Hutter, Marcus

Description

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to a utility function. The VRL setup allows us to...[Show more]

dc.contributor.authorEveritt, Tom
dc.contributor.authorHutter, Marcus
dc.date.accessioned2016-12-21T01:37:16Z
dc.date.available2016-12-21T01:37:16Z
dc.identifier.issn0302-9743
dc.identifier.urihttp://hdl.handle.net/1885/111445
dc.description.abstractHow can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.
dc.format12 pages
dc.format.mimetypeapplication/pdf
dc.publisherSpringer Verlag (Germany)
dc.rights© Springer International Publishing Switzerland 2016
dc.sourceLecture Notes in Computer Science
dc.subjectintelligent
dc.subjectagent
dc.subjecteinforcement learning (RL)
dc.subjectwireheading
dc.subjectproblem
dc.subjectvalue reinforcement learning (VRL)
dc.subjectreward signal
dc.subjectlearn
dc.subjectutility function
dc.titleAvoiding wireheading with value reinforcement learning
dc.typeJournal article
local.description.notesThe article appears as a monographic series in Everitt T., Hutter M. (2016) Avoiding Wireheading with Value Reinforcement Learning. In: Steunebrink B., Wang P., Goertzel B. (eds) Artificial General Intelligence. AGI 2016. Lecture Notes in Computer Science, vol 9782. Springer, Cham.
local.identifier.citationvolume9782
dc.date.issued2016-06-25
local.publisher.urlhttp://link.springer.com/
local.type.statusPublished Version
local.contributor.affiliationHutter, Marcus, Research School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.contributor.affiliationEveritt, Tom, Research School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.bibliographicCitation.startpage12
local.bibliographicCitation.lastpage22
local.identifier.doi10.1007/978-3-319-41649-6_2
dcterms.accessRightsOpen Access
CollectionsANU Research Publications

Download

There are no files associated with this item.


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator