Adaptation in Deep Learning Models: Algorithms and Applications

Date

Authors

Simon, Christian

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Artificial intelligence has been successful to match or even surpass human abilities e.g., recognizing images, playing games, and understanding languages. At the current state, powerful machine learning models learn from data under a stationary environment while humans are capable of learning in dynamic, changing, and sequential conditions. In pursuing the idea of open-ended learning for machine intelligence, we contribute to provide algorithms and analyses for generally capable models via adaptation. In this thesis, the model adaptation problem is defined as the impediment of intelligent machines to learn to modify their behaviors for new purposes or new uses. The ultimate goal is to develop machine intelligence that has the ability to adapt itself by not only following at our behest but also understanding the environment. Our works populate in the area of deep neural networks and transfer learning. Throughout our works, developing adaptive models is divided into four major problems: (1) few-shot learning, (2) fast model adaptation, (3) continual learning, and (4) architecture search. In few-shot learning, a model is expected to change its behavior when facing a new context or an unseen task with limited data. Another important problem within few-shot learning is to adapt quickly from a few data. In the problem of continual learning, the model needs to adapt sequentially depending on the given task. In architecture search, we look for a high-performing configuration for connecting among nodes in a model. To approach the problem of few-shot learning, we opt to use the strategy in transfer learning with a pretrained Convolutional Neural Network (CNN) for novel tasks with limited-data annotations. Inspired by the success of subspace methods for visual recognition, we develop a classifier using subspaces to improve the generalization capability to novel concepts. We also investigate few-shot learning in multi-label classification, and propose a multi-label propagation technique by constructing a graph from the representations of support samples. In pursuing fast model adaptation, we use the idea of preconditioners in optimization. Specifically, the problem revolves in \textit{meta-learning}, where the agent needs to learn a family of tasks and adapt quickly to a new task. Our algorithm uses a non-linear function to generate the preconditioner for modulating the gradient when updating the model. Our experiments show that the model converges more quickly than other types of preconditioners in the same problem. In the problem of continual learning, the model needs to sequentially learn and adapt the network parameters for new tasks without forgetting the previously learned tasks. To this end, we investigate the knowledge distillation approach, where the old model guides the current model to find the balance between the current task and the prior tasks. Our approach models the smoothness between two tasks using the geodesic flow, and the objective is to maximize similarity of the projected responses along the geodesic flow. In neural architecture search, the optimal architecture depends on the task objectives. We observe that searching for an optimal architecture is not trivial while the data annotations is noisy. The study investigates the impact of label noise in obtaining the best performance when optimizing a neural architecture, while also reducing the performance deterioration because of overfitting to noisy labels. We use the mutual information bottleneck to design a noise injection module that can alleviate the impact of learning under label noise. In summary, our works in this thesis address some major problems in model adaptation e.g., few-shot learning, meta-learning, continual learning, and neural architecture search. The solutions are expected to contribute to the arsenal of model adaptation algorithms and the analyses shed light on the essential aspects in adaptation strategies.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

File
Description