Deep Learning based Domain Adaptation

Tas, Yusuf

Deep Learning based Domain Adaptation

Date

2021

Authors

Tas, Yusuf

Abstract

Recent advancements in Deep Learning (DL) has helped researchers achieve fascinating results in various areas of Machine Learning (ML) and Computer Vision (CV). Starting with the ingenious approach of [Krizhevsky et al., 2012a] where they have utilized processing powers of graphical processing units (GPU) to make training large networks a viable choice in terms of training time, DL has had its place in different ML and CV problems over the years since. Object detection and semantic segmentation [Girshick et al., 2014a; Girshick, 2015; Ren et al., 2015], image super resolution [Dong et al., 2015], action recognition [Simonyan and Zisserman, 2014a] etc. are few examples to that. Over years, many more new and powerful DL architectures have been proposed: VGG [Simonyan and Zisserman, 2014b], GoogleNet [Szegedy et al., 2015], ResNet [He et al., 2016] are examples to most commonly used network architectures in the literature. Our focus is on the specific task of Supervised Domain Adaptation (SDA) using Deep Learning. SDA is a type of domain adaptation where target and source domains contain annotated data. Firstly, we look at SDA as a domain alignment problem. We propose a mixture of alignment approach based on second- or higher-order scatter statistics between source and target domains. Although they are different, each class has two distinctive representation in source and target domains. Proposed mixture alignment approach aims to reduce within class scatters to align same classes from source and target while maintaining between-class separation. We design and construct a two stream Convolutional Neural Network (CNN) where one stream receives source data and second one receives the target with matching classes to implement within class alignment. We achieve end-to-end training of our two-stream network together with alignment losses. Next, we propose a new dataset called Open Museum Identification Challenge (Open MIC) for SDA research. Office dataset [Saenko et al., 2010a] is commonly used in SDA literature. But one main drawback of this dataset is that results have saturated, reaching 90+% accuracy. Limited number of images is one of the main causes of high accuracy results. Open MIC aims to provide a large dataset for SDA while providing challenging tasks to be addressed. We also extend our mixture of alignment loss from frobenius norm distance to Bregman divergences and the Riemannian metric to learn the alignment in different feature spaces. In the next study, we propose a new representation to encode 3D body skeleton data into texture like images by using kernel methods for Action Recognition problem. We utilize these representations in our SDA two stream CNN pipeline. We improve our mixture of alignment losses to work with partially overlapping datasets to let us use other datasets available for Action Recognition as additional source domain even if they only partially overlap with the target set. Finally, we move to a more challenging domain adaptation problem: Multimodal Conversation Systems. Multimodal Dialogue dataset (MMD) [Saha et al., 2018] provides dialogues between a shopper and retail agent. In these dialogues, retail agent may also answer with specific retail items such as cloths, shoes etc. Hence flow of the conversation is a multimodal setting where utterances can contain both text and image modalities. Two level RNN encoders are used to encode a given context of utterances. We propose a new approach to this problem by adapting additional data from external domains. For improving text generating capabilities of the model, we utilize French translation of the target sentences as an additional output target. For improving image ranking capabilities of the model, we utilize an external dataset and find nearest neighbors of target positive and negative images. We set up new encoding methods for these nearest neighbors for assigning them to correct target class, positive or negative.