Evaluation of Open-World Learning Systems

Date

Authors

Inguruwattage, Vimukthini

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In recent years, there has been a surge in the popularity and prevalence of autonomous artificial intelligence (AI) systems, including self-driving cars, space probes, and mobile robots. These systems must possess the ability to identify and respond to unforeseen circumstances efficiently and promptly to prevent undesirable outcomes. Open-world learning (OWL) is a research field aimed at addressing this challenge by enabling AI systems to detect and adapt to novel situations. However, progress in OWL requires the development of AI systems with this capability, alongside well-defined frameworks to evaluate agents. This entails systematic test environments that introduce novel situations, a structured set of novel situations for agents to detect and adapt, clear evaluation protocols for agents to adhere to, and a comprehensive range of evaluation metrics to gauge agent performance. The research conducted in this thesis impacts the progress of OWL research through the development of an evaluation framework. In the first part of the thesis, we discuss the development of testbeds for agents to operate. Initially, before agents can work on novelty, we construct a testbed that requires agents to work in a physics-based environment. Our initial testbed enables agents to apply their physical reasoning skills to solve tasks. We then enhance this framework by introducing a series of novel situations where agents must solve physical reasoning tasks in the presence of novel situations, similar to how it is in the real world. Moving forward, we conduct an in-depth analysis of the most widely used evaluation measures in the machine learning literature, such as accuracy, precision, and recall, to determine their applicability in the context of OWL. Based on this analysis, we propose evaluation measures that can specifically assess novelty detection and novelty adaptation performance. These measures draw inspiration from the established evaluation measures and provide a more comprehensive and informative way of evaluating agent performance in OWL. In the final part of this thesis, we focus on improving the conclusions we draw from evaluations by considering the difficulty of novel situations. In the physical world, there are countless variations of novelty that an AI system may encounter. Some of these may be easy for the system to detect and adapt to, while others may be extremely challenging or even impossible. Therefore, it is not meaningful to evaluate an AI system's performance by considering all novel situations together. Thus, we propose measures of difficulty that enable us to understand the range of situations in which an AI system may struggle and to make reliable conclusions about its performance. This research advances the field of OWL by providing a comprehensive evaluation ultimately contributing to the development of robust autonomous AI systems for the ever-evolving world.

Description

Keywords

Citation

Source

Type

Thesis (PhD)

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until