What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

dc.contributor.authorTu, Weijieen
dc.contributor.authorDeng, Weijianen
dc.contributor.authorZheng, Liangen
dc.contributor.authorGedeon, Tomen
dc.date.accessioned2025-05-23T11:24:09Z
dc.date.available2025-05-23T11:24:09Z
dc.date.issued2024en
dc.description.abstractThis work aims to develop a measure that can accurately rank the performance of various classifiers when they are tested on unlabeled data from out-of-distribution (OOD) distributions. We commence by demonstrating that conventional uncertainty metrics, notably the maximum Softmax prediction probability, possess inherent utility in forecasting model generalization across certain OOD contexts. Building on this insight, we introduce a new measure called Softmax Correlation (SoftmaxCorr). It calculates the cosine similarity between a class-class correlation matrix, constructed from Softmax output vectors across an unlabeled test dataset, and a predefined reference matrix that embodies ideal class correlations. A high resemblance of predictions to the reference matrix signals that the model delivers confident and uniform predictions across all categories, reflecting minimal uncertainty and confusion. Through rigorous evaluation across a suite of datasets, including ImageNet, CIFAR-10, and WILDS, we affirm the predictive validity of SoftmaxCorr in accurately forecasting model performance within both in-distribution (ID) and OOD settings. Furthermore, we discuss the limitations of our proposed measure and suggest avenues for future research.en
dc.description.statusPeer-revieweden
dc.identifier.scopus85213130774en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85213130774&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733752168
dc.language.isoenen
dc.rightsPublisher Copyright: © 2024, Transactions on Machine Learning Research. All rights reserved.en
dc.sourceTransactions on Machine Learning Researchen
dc.titleWhat Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?en
dc.typeJournal articleen
dspace.entity.typePublicationen
local.contributor.affiliationTu, Weijie; ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationDeng, Weijian; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationZheng, Liang; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationGedeon, Tom; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.identifier.citationvolume2024en
local.identifier.pureffa5fd91-6c25-4163-b1a5-f1622bbc88f7en
local.identifier.urlhttps://www.scopus.com/pages/publications/85213130774en
local.type.statusPublisheden

Downloads