Uncertainty Calibration for Deep Neural Network
Abstract
Deep neural networks have dramatically advanced performance in various fundamental computer vision tasks. This has led to them being gradually adopted into various modern industries, such as autonomous driving and medical diagnosis. The safety-critical nature of these fields, where failure could have catastrophic consequences, demands reliability guarantees beyond model accuracy. Calibration has been studied to provide such a statistical guarantee on the predictive uncertainties associated with the probabilistic models, to which deep neural networks belong. Recent research on model calibration has shown that deep neural networks are prone to overconfidence, which tends to become even more severe with increasing model complexity. It has driven rigorous research on model calibration methods that could address overconfident predictions of deep neural networks.
In the existing model calibration literature, postprocessing-based methods are preferred for their cost-effectiveness. However, they are found to be vulnerable to miscalibration in target data exhibiting a certain degree of distributional shift. Thus, research emphasis has been shifted to more costly training-based methods to seek extra robustness against out-of-distribution data. Much emphasis has been devoted to research on the loss function-based methods. They usually incorporate a regularisation term on the model output to prevent the model predictions from converging towards the vertices of the simplex. However, its pre-emptive measure in reducing the prediction confidences usually renders the model underconfident on the train data, despite being calibrated on the target data. On the other hand, data augmentation-based methods remain under-investigated in general. It leaves a gap in the literature for developing novel data augmentation-based calibration methods and understanding their calibration mechanisms.
This thesis contributes three novel model calibration methods to improve our understanding of the calibration effects of data augmentation techniques on deep neural networks. Firstly, it presents an auxiliary uncertainty estimation method that leverages an extra network to model the gap between the predictions of the main classification model and the groundtruth label as predictive uncertainty. The estimation from the auxiliary network proves effective in calibrating the main classification model, presenting a novel technique beyond the primary categories in the model calibration literature. Secondly, a stochastic label augmentation technique is proposed to calibrate deep neural networks. It follows the principle of maximum entropy inference to scale the augmentation strength of the label, which translates to modulating the prediction confidences while maintaining the accuracy of an ideal model on the target data. Lastly, a self-calibrating vicinal risk minimisation framework is proposed to leverage both data enrichment and a regularised confidence topology in data space to achieve robust calibration performance on both in-distribution and out-of-distribution data. It enforces absolute confidence on the ground-truth training data and associates gradually reduced confidences with the augmented data in their vicinity. Furthermore, a logit loss is derived from the proposed self-calibrating vicinal risk as a calibration-focused loss function, establishing a connection between the loss function-based and data-augmentation-based calibration methods.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material