Deep Zero- and Few-shot Learning in Computer Vision

Loading...
Thumbnail Image

Date

Authors

Zhang, Hongguang

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Most CNN models rely on the large-scale annotated training data, and the performance turns to be low when the size of training data is limited, therefore numbers of large-scale datasets are proposed for vision tasks. Though training on large-scale dataset can significantly improve the performance, creating such datasets for novel scenarios is costly. Moreover, relying on the number of samples is not consistent with human's learning abilities, as humans have the ability to understand novel concepts from limited examples. Transfer learning is a machine learning topic that addresses the above limitations in current deep models, and it studies how to make machines exploit the acquired knowledge to solve new problems, thus learning by generalizing from existing to new concepts, a trait akin to humans. Empirically, transfer learning for computer vision problems includes the following subtopics: i) zero-shot learning, ii) few-shot learning, and iii) domain adaptation. In this thesis, we study zero-shot and few-shot learning to demonstrate how to learn better from limited training samples. Zero-shot learning requires learning a mapping that associates feature vectors extracted from images with semantic annotations that describe objects and/or scenes of interest. Our first work addresses the weakness of linear mapping in previous zero-shot learning models by learning via non-linear kernelized projections between features and attribute spaces to promote accuracy. We propose an easy learning objective inspired by the Linear Discriminant Analysis method with an incoherence mechanism. The dominant challenge in the generalized zero-shot setting is the unbalanced data distribution during training and testing, which makes it difficult for the classifier to distinguish if a testing sample is from a seen or unseen class. Inspired by GAN-based models, we propose the model selection mechanism, in which we leverage two sources of datapoints to train the model selection classifier to recognize which test datapoints come from seen and which from unseen classes. This way, generalized zero-shot learning can be decoupled into two disjoint classification tasks. Few-shot learning is a variant of knowledge transfer which is formulated as a meta-learning process. Firstly we investigate so-called second-order pooling and Power Normalizations in fine-grained classification tasks to study a proper way of using co-occurrence statistics for such tasks. Following above experiments and analysis, we propose the novel second-order similarity network, which measures image relations via co-occurrence relation descriptors, and significantly improves the performance of few-shot learning. Hallucinating auxiliary samples for training is also a popular and effective way to improve the accuracy of few-shot learning. To this end, we propose a saliency-guided hallucination strategy, which employs a saliency detector to segment foregrounds and backgrounds from sampled images forming an episode, then we mix every foreground with all available backgrounds in convolutional feature space to synthesize descriptors representing objects in various scenes. Furthermore, we note that previous few-shot learning methods fail to address the scale- and location-mismatch between support-query objects, thus leading to the performance loss. Inspired by the spatial-pyramid matching, we propose a novel spatial- and scale-matching network to explicitly match the support-query pairs over different scales and locations. Last but not least, we consider a more challenging scenario, namely few-shot action recognition, to study knowledge transfer in the video domain. Apart from the spatial information, robust aggregation over the temporal mode is fundamental in this task. To address this issue, we propose a novel action relation network with permutation-invariant attentions and auxiliary self-supervision tasks.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

File
Description