Skip navigation
Skip navigation

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

Anderson, Peter James


Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also...[Show more]

CollectionsOpen Access Theses
Date published: 2018
Type: Thesis (PhD)
DOI: 10.25911/5d00d4ec451cc


File Description SizeFormat Image
Anderson Thesis 2019.pdf29.39 MBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator