Loss-Augmented Structured Learning for Semantic Labeling

Rahmatollahi Namin, Shahin

Loss-Augmented Structured Learning for Semantic Labeling

Date

2019

Authors

Rahmatollahi Namin, Shahin

Abstract

Semantic segmentation is among the most significant applications in computer vision. The goal of semantic segmentation is to understand a scene by learning to classify different regions of the scene to meaningful predefined classes. There are different schemes to perform this task such as probabilistic graphical models and deep learning. These methods are among the most successful approaches to address this task. The focus of researchers in the field of semantic segmentation is to propose different models and loss functions, incorporation of attention, or schemes to combine the models to obtain better results. In contrast, in this Thesis, we concentrate on parameter learning of the models for semantic segmentation. In order to obtain the best performance of the utilized schemes for semantic segmentation, learning a set of unknown parameters in the model is necessary. Structured learning is the task of learning models that may have dependent and highly correlated parts or objects. Among existing approaches for structured learning, the structured support vector machine (SSVM) is one of the most well-known discriminative structured learning frameworks. This framework benefits from having flexibilities in defining user-defined loss functions as a function of the true and predicted labels, task-specific joint features dependent on the predicted label and the input features, and joint input-output score functions. This research addresses the problem of semantic segmentation and improves its performance, using the SSVM, in comparison to other conventional approaches. The probabilistic graphical models and deep learning models are the most commonly used schemes for semantic segmentation. We will employ SSVM for parameter learning in both of these models. This will allow these models to benefit from the merits of the SSVM framework. We first propose a Latent SSVM algorithm for parameter learning in graphical models. This algorithm incorporates the probabilities of the unobserved parts of the scene in contrast to predicting these nodes and assuming them to be deterministic in SSVM. The experimental results show promising performance of this proposed Latent SSVM approach with existence of noise and lack of fully observed training data. The proposed algorithm can be used for training CRF parameters for semantic segmentation when the training data for a scene is only partially labeled. Next, we explore more complicated loss functions in the SSVM framework for the purpose of multi-modal semantic segmentation, which employs different sensors and consequently different observations of a scene: here, the modalities are 2D color images and the corresponding 3D point clouds of a scene. We use higher-order loss functions for parameter learning in graphical models and Conditional Random Fields (CRFs) for the multi-modal semantic segmentation. This is the way in which the framework enables the capability to address the user's concerns explicitly in the loss function to be accounted in the learning step. This yields better results than contemporary algorithms targeting multi-modal datasets. Last, we propose an approach which explores applying the SSVM framework to the learning of deep networks. This general framework makes us capable of utilising SSVM in conjunction with other existing loss functions for learning in deep networks, and thereby gain performance improvements also in the case of a deep network. We apply our proposed framework to the contemporary semantic segmentation deep architectures and show that it is beneficial in learning deep networks.