Unsupervised Model Evaluation

Date

2023

Authors

Deng, Weijian

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Understanding model decision under novel test scenarios is central to machine learning. The standard textbook practice is evaluating a model on a held-out test set that is fully labeled and drawn from the same distribution as the training set. However, this supervised manner of evaluation is often infeasible for real-world deployment, where the test environments undergo distribution shifts and data annotations are not provided. Furthermore, real-world machine learning deployments are often characterized by the discrepancy between the training and test distributions that could cause significant performance drop. Therefore, it is important to develop new evaluation schemes for real-world scenarios where annotated data is unavailable. In this thesis, we explore the answer to an interesting question: are labels always necessary for model evaluation? Motivated by this question, we investigate an important but under-explored problem called unsupervised model evaluation, where the goal is to estimate model generalization on various unlabeled out-of-distribution test sets. In particular, this thesis makes contributions to unsupervised model evaluation from four different aspects. In Chapter 3, we report that there is a strong negative linear correlation between model performance and distribution shift. Based on this observation, we propose to predict model accuracy from dataset-level statistics and present two regression methods (i.e., linear regression and network regression) for accuracy estimation. In Chapter 4, we propose to use self-supervision as a criterion for evaluating models. Specifically, we train both supervised semantic image classification and self-supervised rotation prediction in a multi-task way. On a series of datasets, we report an interesting finding: the semantic classification accuracy exhibits a strong linear relationship with the performance of the rotation prediction task. This new finding allows us to use linear regression to estimate classifier performance from the accuracy of rotation prediction which can be obtained on the test set through the freely self-generated rotation labels. In Chapter 5, unlike recent methods that only use prediction confidence, we further consider prediction dispersity. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. Specifically, we aim to consider both properties to make more accurate estimates. To this end, we use the nuclear norm which has been shown to characterize both properties. We show that the nuclear norm makes more accurate and stable accuracy estimations than existing methods. In Chapter 6, from a model-centric perspective, we study the relationship between model generalization and invariance. The former characterizes how well a model performs when encountering in-distribution or out-of-distribution test data, while the latter captures whether the model gives consistent predictions when the input data is transformed. We perform large-scale quantitative correlation studies between generalization and invariance. We observe generalization and invariance of different models exhibit a strong linear relationship on both in-distribution and out-of-distribution datasets. This new finding allows us to assess and rank the performance of various models on a new dataset. All in all, this thesis focuses on evaluating models on previously unseen distributions without annotations. It is an important but under-explored challenge with implications for increasing the reliability of machine learning models. The extensive analysis demonstrates that our studies present a promising step forward for estimating performance drop under distribution shift and lay the groundwork for future research exploring unsupervised model evaluation.

Description

Keywords

Citation

Source

Type

Thesis (PhD)

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

File
Description