Skip navigation
Skip navigation

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

Anderson, Peter James

Description

Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also...[Show more]

CollectionsOpen Access Theses
Date published: 2018
Type: Thesis (PhD)
URI: http://hdl.handle.net/1885/164018
DOI: 10.25911/5d00d4ec451cc

Download

File Description SizeFormat Image
Anderson Thesis 2019.pdf29.39 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  22 January 2019/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator