Unsupervised Dense Prediction Using Differentiable Normalized Cuts
Date
Authors
Liu, Yanbin
Gould, Stephen
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science+Business Media B.V.
Access Statement
Abstract
With the emergent attentive property of self-supervised Vision Transformer (ViT), Normalized Cuts (NCut) has resurfaced as a powerful tool for unsupervised dense prediction. However, the pre-trained ViT backbone (e.g., DINO) is frozen in existing methods, which makes the feature extractor suboptimal for dense prediction tasks. In this paper, we propose using Differentiable Normalized Cuts for self-supervised dense feature learning that can improve the dense prediction capability of existing pre-trained models. First, we review an efficient gradient formulation for the classical NCut algorithm. This formulation only leverages matrices computed and stored in the forward pass, making the backward pass highly efficient. Second, with NCut gradients in hand, we design a self-supervised dense feature learning architecture to finetune pre-trained models. Given two random augmented crops of an image, the architecture performs RoIAlign and NCut to generate two foreground masks of their overlapping region. Last, we propose a mask-consistency loss to back-propagate through NCut and RoIAlign for model training. Experiments show that our framework generalizes to various pre-training methods (DINO, MoCo and MAE), network configurations (ResNet, ViT-S and ViT-B), and tasks (unsupervised saliency detection, object discovery and semantic segmentation). Moreover, we achieved state-of-the-art results on unsupervised dense prediction benchmarks.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
Entity type
Publication
Access Statement
License Rights
Restricted until
Downloads
File
Description