Deep0Tag: Deep Multiple Instance Learning for Zero-Shot Image Tagging
Date
2020
Authors
Rahman, Shafin
Khan, Salman Hameed
Barnes, Nick
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers (IEEE Inc)
Abstract
Zero-shot learning aims to perform visual reasoning
about unseen objects. In-line with the success of deep learning on
object recognition problems, several end-to-end deep models for
zero-shot recognition have been proposed in the literature. These
models are successful in predicting a single unseen label given
an input image but do not scale to cases where multiple unseen
objects are present. Here, we focus on the challenging problem
of zero-shot image tagging, where multiple labels are assigned to
an image, that may relate to objects, attributes, actions, events,
and scene type. Discovery of these scene concepts requires the
ability to process multi-scale information. To encompass global
as well as local image details, we propose an automatic approach
to locate relevant image patches and model image tagging within
the Multiple Instance Learning (MIL) framework. To the best of
our knowledge, we propose the first end-to-end trainable deep
MIL framework for the multi-label zero-shot tagging problem. We
explore several alternatives for instance-level evidence aggregation
and perform an extensive ablation study to identify the optimal
pooling strategy. Due to its novel design, the proposed framework
has several interesting features: 1) unlike previous deep MIL
models, it does not use any off-line procedure (e.g., Selective Search
or EdgeBoxes) for bag generation. 2) During test time, it can process
any number of unseen labels given their semantic embedding
vectors. 3) Using only image-level seen labels as weak annotation,
it can produce a localized bounding box for each predicted label.
We experiment with the large-scale NUS-WIDE and MS-COCO
datasets and achieve superior performance across conventional,
zero-shot, and generalized zero-shot tagging tasks.
Description
Keywords
Deep learning, Multiple instance learning, Feature pooling, Object detection, Zero-shot tagging
Citation
Collections
Source
IEEE Transactions on Multimedia
Type
Journal article
Book Title
Entity type
Access Statement
License Rights
Restricted until
2099-12-31