Zero-shot Learning: Recognition, Tagging, and Detection of Novel Concepts

dc.contributor.authorRahman, Shafin
dc.date.accessioned2020-05-14T11:52:50Z
dc.date.available2020-05-14T11:52:50Z
dc.date.issued2020
dc.description.abstractRecent advancements in deep neural networks have performed favourably well on the supervised object recognition task. Towards an ultimate automated visual recognition system, we identify three key shortcomings of the existing supervised learning approaches. First, the dependency on a significantly large volume of manually annotated examples (e.g., ImageNet dataset with ~10 million images) limits the scalability of deep networks. Second, once the model learning stage is complete, it is difficult to add new classes continually when new data becomes available. Lastly, such models lack the notion of human-like understanding, i.e., an object can be recognized by humans without having any visual examples and just by a semantic description of its distinctive characteristics. In this thesis, we investigate the zero-shot learning (ZSL) framework to address these limitations. ZSL aims to perform reasoning about previously unseen objects without observing even a single instance of them. Such a learning paradigm requires no visual examples of novel objects, no re-training to add new classes and does not rely solely on visual information. Considering the relationship among the semantic description of previously seen examples, ZSL incorporates human wisdom to the visual understanding developed by a machine. In this work, we specifically address three critical bottlenecks in ZSL research which give rise to three ZSL problem settings: (a) unified zero-shot recognition (ZSR), (b) zero-shot tagging (ZST), and (c) zero-shot detection (ZSD) of novel concepts. Established ZSR methods are not flexible enough to adapt to one/few-shot learning (O/FSL) scenario where one/few labeled examples of unseen classes become available during the supervised learning stage. To provide a comprehensive and flexible solution, we present a novel 'unified' approach for ZSL and O/FSL based on class adapting principal direction (CAPD) that computes class-specific discriminative information by relating the semantic description of categories. The primary objective is to learn a metric in the semantic embedding space that minimizes intra-class distances and maximizes inter-class distances. In the real-life scenario, instead of a single object per image, a scene may contain multiple seen and unseen concepts together. To adopt this, we present the Deep0Tag approach for zero-shot tagging (ZST) to assign multiple labels to an input. This method considers both global and local details to discover seen or unseen concepts from a given scene. We solve this problem by formulating a multiple instance learning (MIL) framework. Unlike traditional MIL solutions, our method runs end-to-end without using offline object proposal generation methods. While most of the ZSL methods provide answers to unseen categories in simple tasks, e.g., single or multi-label classification and retrieval, we also focus on predicting both multi-class category-label and precise location of each instance in a given image. To this end, we introduce a new challenge for ZSL called zero-shot detection (ZSD) that simultaneously recognizes and localizes multiple novel concepts. Similar to traditional object detection, we present zero-shot version of double (Faster R-CNN) and single (RetinaNet) stage end-to-end object detectors. For both of the cases, we design associated loss functions that consider visual-semantic relationships to train the network. In addition to inductive learning approaches, we also propose the first transductive learning method for ZSD to reduce the domain-shift and model-bias against unseen classes convincingly. Our transductive approach follows a self-learning mechanism that uses a novel hybrid pseudo-labeling technique. Finally, we recommend training and testing protocols to evaluate ZSD based on large-scale ILSVRC-2017 and MSCOCO-2014 datasets.
dc.identifier.otherb71498229
dc.identifier.urihttp://hdl.handle.net/1885/204349
dc.language.isoen_US
dc.titleZero-shot Learning: Recognition, Tagging, and Detection of Novel Concepts
dc.typeThesis (PhD)
local.contributor.authoremailu5929575@anu.edu.au
local.contributor.supervisorKhan, Salman
local.contributor.supervisorcontactu1029115@anu.edu.au
local.identifier.doi10.25911/5ee74e8179e71
local.identifier.proquestYes
local.identifier.researcherIDN-1939-2019
local.mintdoimint
local.thesisANUonly.authorb5451b9b-c89b-4a72-9cc1-f80695647640
local.thesisANUonly.key61fbf382-c7db-32ff-f2eb-18ca468746a4
local.thesisANUonly.title000000015657_TC_1

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_Shafin.pdf
Size:
12.03 MB
Format:
Adobe Portable Document Format
Description:
Thesis Material