Generative Transformer for Accurate and Reliable Salient Object Detection

dc.contributor.authorMao, Yuxinen
dc.contributor.authorZhang, Jingen
dc.contributor.authorWan, Zhexiongen
dc.contributor.authorTian, Xinyuen
dc.contributor.authorLi, Aixuanen
dc.contributor.authorLv, Yunqiuen
dc.contributor.authorDai, Yuchaoen
dc.date.accessioned2025-05-23T09:21:45Z
dc.date.available2025-05-23T09:21:45Z
dc.date.issued2025en
dc.description.abstractWe explore the impact of transformers on accurate and reliable salient object detection. For accuracy, we integrate the transformer with a deterministic model and delineate its advantages in structural modeling. Regarding reliability, we address the transformer's tendency to produce overly confident, incorrect predictions. To gauge reliability implicitly, we introduce a latent variable model within the transformer framework, termed the inferential generative adversarial network (iGAN). The stochastic nature of the latent variable facilitates the estimation of predictive uncertainty, which serves as an auxiliary measure of the model's prediction reliability. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution N0,I. The proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to fully supervised salient object detection, explaining that iGAN within the transformer framework leads to both accurate and reliable salient object detection. The source code and experimental results are publicly available via our project page: https://npucvr.github.io/TransformerSOD.en
dc.description.sponsorshipThis work was partly supported by the National Natural Science Foundation of China (62271410). Yuxin Mao is sponsored by the Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University (CX2024014). We thank Dr. Dengping Fan for the discussion and help during the manuscript submission process.en
dc.description.statusPeer-revieweden
dc.format.extent14en
dc.identifier.issn1051-8215en
dc.identifier.scopus85205319292en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85205319292&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733751894
dc.language.isoenen
dc.rights © 1991-2012 IEEE.en
dc.sourceIEEE Transactions on Circuits and Systems for Video Technologyen
dc.subjectinferential generative adversarial networken
dc.subjectsalient object detectionen
dc.subjectVision transformeren
dc.titleGenerative Transformer for Accurate and Reliable Salient Object Detectionen
dc.typeJournal articleen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage1054en
local.bibliographicCitation.startpage1041en
local.contributor.affiliationMao, Yuxin; Northwestern Polytechnical University Xianen
local.contributor.affiliationZhang, Jing; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationWan, Zhexiong; Northwestern Polytechnical University Xianen
local.contributor.affiliationTian, Xinyu; Northwestern Polytechnical University Xianen
local.contributor.affiliationLi, Aixuan; Northwestern Polytechnical University Xianen
local.contributor.affiliationLv, Yunqiu; Northwestern Polytechnical University Xianen
local.contributor.affiliationDai, Yuchao; Northwestern Polytechnical University Xianen
local.identifier.citationvolume35en
local.identifier.doi10.1109/TCSVT.2024.3469286en
local.identifier.pure62f02a02-0c98-4626-b14a-36682b4e0371en
local.identifier.urlhttps://www.scopus.com/pages/publications/85205319292en
local.type.statusPublisheden

Downloads