ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

dc.contributor.authorTian, Xinyuen
dc.contributor.authorZou, Shuen
dc.contributor.authorYang, Zhaoyuanen
dc.contributor.authorZhang, Jingen
dc.date.accessioned2025-05-23T04:21:09Z
dc.date.available2025-05-23T04:21:09Z
dc.date.issued2024en
dc.description.abstractAlthough soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts. We address this issue with Attribute-Guided Prompt Tuning (ArGue), making three key contributions. 1) In contrast to the conventional approach of directly appending soft prompts preceding class names, we align the model with primitive visual attributes generated by Large language Models (LLMs). We posit that a model's ability to express high confidence in these attributes signifies its capacity to discern the correct class rationales. 2) We introduce attribute sampling to eliminate disadvantageous attributes, thus only semantically meaningful attributes are preserved. 3) We propose negative prompting, explicitly enumerating class-agnostic attributes to activate spurious correlations and encourage the model to generate highly orthogonal probability distributions in relation to these negative features. In experiments, our method significantly out-performs current state-of-the-art prompt tuning methods on both novel class prediction and out-of-distribution generalization tasks. The code is available https://github.com/Liam-Tian/ArGue.en
dc.description.statusPeer-revieweden
dc.format.extent10en
dc.identifier.issn1063-6919en
dc.identifier.scopus85188970177en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85188970177&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733751288
dc.language.isoenen
dc.relation.ispartofseries2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024en
dc.rightsPublisher Copyright: © 2024 IEEE.en
dc.sourceProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognitionen
dc.subjectfew-shot adaptationen
dc.subjectprompt tuningen
dc.subjectvision-language modelen
dc.titleArGue: Attribute-Guided Prompt Tuning for Vision-Language Modelsen
dc.typeConference paperen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage28587en
local.bibliographicCitation.startpage28578en
local.contributor.affiliationTian, Xinyu; Australian National Universityen
local.contributor.affiliationZou, Shu; Australian National Universityen
local.contributor.affiliationYang, Zhaoyuan; GE Researchen
local.contributor.affiliationZhang, Jing; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.identifier.doi10.1109/CVPR52733.2024.02700en
local.identifier.purefcb5d592-6fa6-47d3-b686-bc705668c9d5en
local.identifier.urlhttps://www.scopus.com/pages/publications/85188970177en
local.type.statusPublisheden

Downloads