Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents
Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also...[Show more]
|Collections||Open Access Theses|
|Anderson Thesis 2019.pdf||29.39 MB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.