A sparse coding approach to multiclass pixel labelling

Datta, Susmita

A sparse coding approach to multiclass pixel labelling

Date

2015

Authors

Datta, Susmita

Abstract

In this thesis, we introduce a method for multiclass pixel labelling to facilitate scene understanding and semantic segmentation. Specifically, we focus on enforcing label consistency in local image regions through sparse modelling of image data. Individual pixels in an image may be assigned a label denoting a semantic category (such as sky, grass, human, car, etc.) based their observed features (such as colour, texture cues etc.). Such an independent classification scheme, without regards to the labels assigned to the neighbouring pixels, tend to be inconsistent or noisy -- pixels belonging to the same class often end up taking different labels. We employ a sparse denoising technique to approximate the noisy label configuration of a local image region with a more coherent one by combining a subset of semantically consistent example label configurations. We provide a diverse collection of label configurations as models (atoms) in the form of a dictionary and let a sparsity promoting technique come up with a configuration that can effectively correct for the sporadic and spurious label annotations of a local image region. We develop our pipeline in several stages. Initially, we take an l1-minimisation approach; we impose an l1-norm sparsity prior to obtain a weighted sparse linear combination of the dictionary atoms that depicts a consistent label configuration for a local image patch. Our initial formulation targets discrete, non-overlapping image patches individually. It falls short of assigning labels that manifest continuity between pixels within and between patches. We tackle this by introducing a shift-invariant model, which combines the outcomes of the same l1-minimisation formulation applied to overlapping image patches. This model is capable of removing spurious label annotations that do not comply with the labels of local image regions, and successful in tracing class-boundaries and fine-details of object structures. In another extension to our initial formulation, we target more extended image regions such that the label assignments of neighbouring pixels within and across patches comply with each other. We encourage the adjacent pixels with similar appearance to settle for the same label. This is achieved by adding a total variation penalty to our existing l1-minimisation formulation. This approach gives more accurate results. Our overall framework achieves a performance comparable to the state-of-the-art CRF-based pixel labelling methods that impose a smoothness prior in local image regions.