A learning framework for higher-order consistency models in multi-class pixel labeling problems




Park, Kyoungup

Journal Title

Journal ISSN

Volume Title



Recently, higher-order Markov random field (MRF) models have been successfully applied to problems in computer vision, especially scene understanding problems. One successful higher-order MRF model for scene understanding is the consistency model [Kohli and Kumar, 2010; Kohli et al., 2009] and earlier work by Ladicky et al. [2009, 2013] which contain higher-order potentials composed of lower linear envelope functions. In semantic image segmentation problems, which seek to identify the pixels of images with pre-defined labels of objects and backgrounds, this model encourages consistent label assignments over segmented regions of images. However, solving this MRF problem exactly is generally NP-hard; instead, efficient approximate inference algorithms are used. Furthermore, the lower linear envelope functions involve a number of parameters to learn. But, the typical cross-validation used for pairwise MRF models is not a practical method for estimating such a large number of parameters. Nevertheless, few works have proposed efficient learning methods to deal with the large number of parameters in these consistency models. In this thesis, we propose a unified inference and learning framework for the consistency model. We investigate various issues and present solutions for inference and learning with this higher-order MRF model as follows. First, we derive two variants of the consistency model for multi-class pixel labeling tasks. Our model defines an energy function scoring any given label assignments over an image. In order to perform Maximum a posteriori (MAP) inference in this model, we minimize the energy function using move-making algorithms in which the higher-order problems are transformed into tractable pairwise problems. Then, we employ a max-margin framework for learning optimal parameters. This learning framework provides a generalized approach for searching the large parameter space. Second, we propose a novel use of the Gaussian mixture model (GMM) for encoding consistency constraints over a large set of pixels. Here, we use various oversegmentation methods to define coherent regions for the consistency potentials. In general, Mean shift (MS) produces locally coherent regions, and GMM provides globally coherent regions, which do not need to be contiguous. Our model exploits both local and global information together and improves the labeling accuracy on real data sets. Accordingly, we use multiple higher-order terms associated with each over-segmentation method. Our learning framework allows us to deal with the large number of parameters involved with multiple higher-order terms. Next, we explore a dual decomposition (DD) method for our multi-class consistency model. The dual decomposition MRF (DD-MRF) is an alternative method for optimizing the energy function. In dual decomposition, a complex MRF problem is decomposed into many easy subproblems and we optimize the relaxed dual problem using a projected subgradient method. At convergence, we expect a global optimum in the dual space because it is a concave maximization problem. To optimize our higher-order DD-MRF exactly, we propose an exact minimization algorithm for solving the higher-order subproblems. Moreover, the minimization algorithm is much more efficient than graph-cuts. The dual decomposition approach also solves the max-margin learning problem by minimizing the dual losses derived from DD-MRF. Here, our minimization algorithm allows us to optimize the DD learning exactly and efficiently, which in most cases finds better parameters than the previous learning approach. Last, we focus on improving labeling accuracies of our higher-order model by combining mid-level features, which we call region features. The region features help customize the general envelope functions for individual segmented regions. By assigning specified weights to the envelope functions, we can choose subsets of highly likely labels for each segmented region. We train multiple classifiers with region features and aggregate them to increase prediction performance of possible labels for each region. Importantly, introducing these region features does not change the previous inference and learning algorithms.



Markov random field, Structured Support Vector Machine, dual-decomposition, semantic segmentation, pixel labeling problems, inference, learning




Thesis (PhD)

Book Title

Entity type

Access Statement

License Rights



Restricted until