Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

dc.contributor.authorYan, Hanen
dc.contributor.authorLi, Yangen
dc.contributor.authorWu, Zhennanen
dc.contributor.authorChen, Shenzhouen
dc.contributor.authorSun, Weixuanen
dc.contributor.authorShang, Taizhangen
dc.contributor.authorLiu, Weizheen
dc.contributor.authorChen, Tianen
dc.contributor.authorDai, Xiaqiangen
dc.contributor.authorMa, Chaoen
dc.contributor.authorLi, Hongdongen
dc.contributor.authorJi, Panen
dc.date.accessioned2025-05-23T02:22:12Z
dc.date.available2025-05-23T02:22:12Z
dc.date.issued2024-12-03en
dc.description.abstractWe present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single triplane tensor, from which multiple Signed Distance Function (SDF) fields can be decoded to represent the compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of the compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting.en
dc.description.sponsorshipThis work was supported in part by NSFC (62322113, 62376156) and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102).en
dc.description.statusPeer-revieweden
dc.identifier.isbn9798400711312en
dc.identifier.otherORCID:/0000-0003-4125-1554/work/184100031en
dc.identifier.scopus85217101369en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85217101369&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733750817
dc.language.isoenen
dc.publisherAssociation for Computing Machinery (ACM)en
dc.relation.ispartofProceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024en
dc.relation.ispartofseries2024 SIGGRAPH Asia 2024 Conference Papers, SA 2024en
dc.relation.ispartofseriesProceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024en
dc.rightsPublisher Copyright: © 2024 Copyright held by the owner/author(s).en
dc.subject3D Scene Generationen
dc.subjectDiffusion Modelen
dc.subjectSemantic Compositionen
dc.titleFrankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Planeen
dc.typeConference paperen
dspace.entity.typePublicationen
local.contributor.affiliationYan, Han; Shanghai Jiao Tong Universityen
local.contributor.affiliationLi, Yang; Tencenten
local.contributor.affiliationWu, Zhennan; The University of Tokyoen
local.contributor.affiliationChen, Shenzhou; Tencenten
local.contributor.affiliationSun, Weixuan; Tencenten
local.contributor.affiliationShang, Taizhang; Tencenten
local.contributor.affiliationLiu, Weizhe; Tencenten
local.contributor.affiliationChen, Tian; Tencenten
local.contributor.affiliationDai, Xiaqiang; Tencenten
local.contributor.affiliationMa, Chao; Shanghai Jiao Tong Universityen
local.contributor.affiliationLi, Hongdong; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationJi, Pan; Tencenten
local.identifier.doi10.1145/3680528.3687672en
local.identifier.pure8bd2af28-fa9e-4e0c-a262-b1db4a3f35f4en
local.identifier.urlhttps://www.scopus.com/pages/publications/85217101369en
local.type.statusPublisheden

Downloads