Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Date

Authors

Xu, Ming
Gould, Stephen

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE Computer Society

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an opti-mal transport problem. By encoding a temporal consistency prior into a Gromov- Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov- Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsu-pervised learning setting, where our method is used to gen-erate pseudo-labels for self-training. We evaluate our seg-mentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desk-top Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Description

Citation

Source

Book Title

Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

Entity type

Publication

Access Statement

License Rights

Restricted until