Unsupervised Dense Prediction Using Differentiable Normalized Cuts

Date

Authors

Liu, Yanbin
Gould, Stephen

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science+Business Media B.V.

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

With the emergent attentive property of self-supervised Vision Transformer (ViT), Normalized Cuts (NCut) has resurfaced as a powerful tool for unsupervised dense prediction. However, the pre-trained ViT backbone (e.g., DINO) is frozen in existing methods, which makes the feature extractor suboptimal for dense prediction tasks. In this paper, we propose using Differentiable Normalized Cuts for self-supervised dense feature learning that can improve the dense prediction capability of existing pre-trained models. First, we review an efficient gradient formulation for the classical NCut algorithm. This formulation only leverages matrices computed and stored in the forward pass, making the backward pass highly efficient. Second, with NCut gradients in hand, we design a self-supervised dense feature learning architecture to finetune pre-trained models. Given two random augmented crops of an image, the architecture performs RoIAlign and NCut to generate two foreground masks of their overlapping region. Last, we propose a mask-consistency loss to back-propagate through NCut and RoIAlign for model training. Experiments show that our framework generalizes to various pre-training methods (DINO, MoCo and MAE), network configurations (ResNet, ViT-S and ViT-B), and tasks (unsupervised saliency detection, object discovery and semantic segmentation). Moreover, we achieved state-of-the-art results on unsupervised dense prediction benchmarks.

Description

Keywords

Citation

Source

Book Title

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings

Entity type

Publication

Access Statement

License Rights

Restricted until

Downloads