Avoiding wireheading with value reinforcement learning
Date
2016-06-25
Authors
Everitt, Tom
Hutter, Marcus
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Verlag (Germany)
Abstract
How can we design good goals for arbitrarily intelligent
agents? Reinforcement learning (RL) may seem like a natural approach.
Unfortunately, RL does not work well for generally intelligent agents, as
RL agents are incentivised to shortcut the reward sensor for maximum
reward – the so-called wireheading problem. In this paper we suggest an
alternative to RL called value reinforcement learning (VRL). In VRL,
agents use the reward signal to a utility function. The VRL setup
allows us to remove the incentive to wirehead by placing a constraint
on the agent’s actions. The constraint is defined in terms of the agent’s
belief distributions, and does not require an explicit specification of which
actions constitute wireheading.
Description
Keywords
intelligent, agent, einforcement learning (RL), wireheading, problem, value reinforcement learning (VRL), reward signal, learn, utility function
Citation
Collections
Source
Lecture Notes in Computer Science
Type
Journal article
Book Title
Entity type
Access Statement
Open Access