ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion

Yang, Jiayu; Cheng, Ziang; Duan, Yunfei; Ji, Pan; Li, Hongdong

ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion

Date

2024

Authors

Yang, Jiayu

Cheng, Ziang

Duan, Yunfei

Ji, Pan

Li, Hongdong

Abstract

Given a single image of a 3D object, this paper proposes a novel method (named ConsistNet) that can generate multiple images of the same object, as if they are capturedfrom different viewpoints, while the 3D (multi-view) consistencies among those multiple generated images are effectively exploited. Central to our method is a lightweight multi-view consistency block that enables information exchange across multiple single-view diffusion processes based on the underlying multi-view geometry principles. ConsistNet is an extension to the standard latent diffusion model and it consists of two submodules: (a) a view aggregation module that unprojects multi-view features into global 3D volumes and infers consistency, and (b) a ray aggregation module that samples and aggregates 3D consistent features back to each view to enforce consistency. Our approach departs from previous methods in multi-view image generation, in that it can be easily dropped in pretrained LDMs without requiring explicit pixel correspondences or depth prediction. Experiments show that our method effectively learns 3D consistency over a frozen Zero123-XL backbone and can generate 16 surrounding views of the object within 11 seconds on a single A100 GPU. Our code will be made available on https://github.com/JiayuYANG/ConsistNet.

Keywords

image generation, latent diffusion model

URI

http://www.scopus.com/inward/record.url?scp=85201963317&partnerID=8YFLogxK
https://hdl.handle.net/1885/733751312

Collections

ANU Research Publications

Source

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Type

Conference paper

Entity type

Publication

DOI

10.1109/CVPR52733.2024.00676

Full item page

Cultural advice

ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until