Towards Safe Artificial General Intelligence
Date
2018
Authors
Everitt, Tom
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The field of artificial intelligence has recently experienced a
number of breakthroughs thanks to progress in deep learning and
reinforcement learning. Computer algorithms now outperform humans
at Go, Jeopardy, image classification, and lip reading, and are
becoming very competent at driving cars and interpreting natural
language. The rapid development has led many to conjecture that
artificial intelligence with greater-than-human ability on a wide
range of tasks may not be far. This in turn raises concerns
whether we know how to control such systems, in case we were to
successfully build them.
Indeed, if humanity would find itself in conflict with a system
of much greater intelligence than itself, then human society
would likely lose. One way to make sure we avoid such a conflict
is to ensure that any future AI system with potentially
greater-than-human-intelligence has goals that are aligned with
the goals of the rest of humanity. For example, it should not
wish to kill humans or steal their resources.
The main focus of this thesis will therefore be goal alignment,
i.e. how to design artificially intelligent agents with goals
coinciding with the goals of their designers. Focus will mainly
be directed towards variants of reinforcement learning, as
reinforcement learning currently seems to be the most promising
path towards powerful artificial intelligence. We identify and
categorize goal misalignment problems in reinforcement learning
agents as designed today, and give examples of how these agents
may cause catastrophes in the future. We also suggest a number of
reasonably modest modifications that can be used to avoid or
mitigate each identified misalignment problem. Finally, we also
study various choices of decision algorithms, and conditions for
when a powerful reinforcement learning system will permit us to
shut it down.
The central conclusion is that while reinforcement learning
systems as designed today are inherently unsafe to scale to human
levels of intelligence, there are ways to potentially address
many of these issues without straying too far from the currently
so successful reinforcement learning paradigm. Much work remains
in turning the high-level proposals suggested in this thesis into
practical algorithms, however.
Description
Keywords
Artificial intelligence, AI safety, reinforcement learning, causality
Citation
Collections
Source
Type
Thesis (PhD)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description