Avoiding wireheading with value reinforcement learning

Everitt, Tom; Hutter, Marcus

doi:10.1007/978-3-319-41649-6_2

A change is coming. Click to see a sneak peek of the new Open Research Repository.

Avoiding wireheading with value reinforcement learning

link to publisher version

Altmetric Citations

Everitt, Tom; Hutter, Marcus

Description

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to a utility function. The VRL setup allows us to...[Show more] remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.

dc.contributor.author	Everitt, Tom
dc.contributor.author	Hutter, Marcus
dc.date.accessioned	2016-12-21T01:37:16Z
dc.date.available	2016-12-21T01:37:16Z
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/1885/111445
dc.description.abstract	How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.
dc.format	12 pages
dc.format.mimetype	application/pdf
dc.publisher	Springer Verlag (Germany)
dc.rights	© Springer International Publishing Switzerland 2016
dc.source	Lecture Notes in Computer Science
dc.subject	intelligent
dc.subject	agent
dc.subject	einforcement learning (RL)
dc.subject	wireheading
dc.subject	problem
dc.subject	value reinforcement learning (VRL)
dc.subject	reward signal
dc.subject	learn
dc.subject	utility function
dc.title	Avoiding wireheading with value reinforcement learning
dc.type	Journal article
local.description.notes	The article appears as a monographic series in Everitt T., Hutter M. (2016) Avoiding Wireheading with Value Reinforcement Learning. In: Steunebrink B., Wang P., Goertzel B. (eds) Artificial General Intelligence. AGI 2016. Lecture Notes in Computer Science, vol 9782. Springer, Cham.
local.identifier.citationvolume	9782
dc.date.issued	2016-06-25
local.publisher.url	http://link.springer.com/
local.type.status	Published Version
local.contributor.affiliation	Hutter, Marcus, Research School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.contributor.affiliation	Everitt, Tom, Research School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.bibliographicCitation.startpage	12
local.bibliographicCitation.lastpage	22
local.identifier.doi	10.1007/978-3-319-41649-6_2
dcterms.accessRights	Open Access
Collections	ANU Research Publications

Download

There are no files associated with this item.

Show simple item record

Avoiding wireheading with value reinforcement learning

Altmetric Citations

Description

Download