Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

From known to the unknown: Transferring knowledge to answer questions about novel visual and semantic concepts

Loading...
Thumbnail Image

Date

Authors

Farazi, Moshiur
Khan, Salman Hameed
Barnes, Nick

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Abstract

Current Visual Question Answering (VQA) systems can answer intelligent questions about ‘known’ visual content. However, their performance drops significantly when questions about visually and linguistically ‘unknown’ concepts are presented during inference (‘Open-world’ scenario). A practical VQA system should be able to deal with novel concepts in real world settings. To address this problem, we propose an exemplar-based approach that transfers learning (i.e., knowledge) from previously ‘known’ concepts to answer questions about the ‘unknown’. We learn a highly discriminative joint embedding (JE) space, where visual and semantic features are fused to give a unified representation. Once novel concepts are presented to the model, it looks for the closest match from an exemplar set in the JE space. This auxiliary information is used alongside the given Image-Question pair to refine visual attention in a hierarchical fashion. Our novel attention model is based on a dual-attention mechanism that combines the complementary effect of spatial and channel attention. Since handling the high dimensional exemplars on large datasets can be a significant challenge, we introduce an efficient matching scheme that uses a compact feature description for search and retrieval. To evaluate our model, we propose a new dataset for VQA, separating unknown visual and semantic concepts from the training set. Our approach shows significant improvements over state-of-the-art VQA models on the proposed Open-World VQA dataset and other standard VQA datasets.

Description

Citation

Source

Image and Vision Computing

Book Title

Entity type

Access Statement

License Rights

Restricted until

2099-12-31
abcd