Navigating the Embodied Agent: Design and Optimisation of Object-Goal Visual Navigation System
Abstract
Embodied Artificial Intelligence (Embodied AI) represents a significant shift from traditional static computations to dynamic interactions with the environment, interactions that not only respond to but also actively shape the physical world. Within this paradigm, Object-Goal Visual Navigation (ObjNav) is pivotal, as it equips robots with the capability to navigate and pinpoint specific target objects, an essential function that underpins the operational framework of Embodied AI systems. A typical reinforcement learning based ObjNav system comprises two primary components:(i) a visual perception module that interprets the scene by extracting navigation cues, such as the target's location and its spatial relationships with surrounding elements; and (ii) a navigation policy module that processes both the current visual inputs and historical navigation data to determine the optimal action. This ensures that the ObjNav system not only recognises its targets but also formulates effective navigation strategies, thereby enabling robust and adaptable performance in dynamic real-world environments.
This thesis investigates Object-goal Visual Navigation systems, focusing on both visual perception and navigation policy. We enhance visual perception by leveraging object relationships within scenes. Meanwhile, we first concentrate on preventing the navigation policy from predicting actions that often lead to failure. Then, we shift our focus to prioritising information most relevant to the current navigational step. We achieve this through three key contributions:
In Chapter 3, we enhance object-driven visual navigation using object relation graphs (ORG), trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN). ORG improves visual understanding by modelling object relationships. IL and TPN help create robust navigation policies, guiding the agent away from unproductive actions and promoting efficient navigation.
In Chapter 4, we introduce VTNet, a Visual Transformer Network for learning informative visual representations. VTNet emphasises spatial locations and object relationships, using attention mechanisms to create informative visual information for navigation decisions. A pre-training scheme aligns these representations with navigation signals for effective policy learning.
In Chapter 5, we address the impact of navigation states on effectiveness and efficiency by introducing the History-inspired Navigation Policy Learning (HiNL) framework. HiNL uses historical navigation data to improve current decision-making. It incorporates a History-aware State Estimation (HaSE) module to reduce the influence of past states and a History-based State Regularisation (HbSR) technique to minimise correlations among states during training. This enables the agent to adapt to changing environments and make informed decisions.
In Chapter 6, we introduce the Experience-aware Action Cogitator (ExAC), a framework that utilises the power of Large Language Models (LLMs) to elevate decision-making in ObjNav. By integrating insights derived from both expert-informed and trial-and-error experiences, ExAC refines the LLM's decision framework. Consequently, the model not only predicts effective navigation actions with remarkable intuition but also articulates the reasoning behind its choices, thereby enhancing the transparency and reliability of navigation in complex indoor settings.
In conclusion, this thesis has tackled multiple open challenges in Embodied AI, especially in the area of Object-goal Navigation, showing improvements in both navigation effectiveness and efficiency in unseen environments. Through this work, we aim to foster conversations among academics in the Embodied AI space while also motivating future efforts and collaborations that would push the field toward more challenging real-world problems.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description