HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

Submitted ⁨⁨2⁩ ⁨years⁩ ago⁩ by ⁨Even_Adder@lemmy.dbzer0.com⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://i.imgur.com/gP8GoDx.png

Abstract

Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding latent triplane. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs when comparing to existing approaches. Furthermore, HexaGen3D demonstrates strong generalization to new objects or compositions.

Paper: arxiv.org/abs/2401.07727

Image

source

Comments

Sort:hotnew top

toothpaste_sandwich@feddit.nl ⁨2⁩ ⁨years⁩ ago
www.youtube.com/watch?v=Oy5DAxGhV_c&t=87

source
tagginator@utter.online [bot] ⁨2⁩ ⁨years⁩ ago
New Lemmy Post: HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation (https://lemmy.dbzer0.com/post/12527472)
Tagging: #StableDiffusion
(Replying in the OP of this thread (NOT THIS BOT!) will appear as a comment in the lemmy discussion.)
I am a FOSS bot. Check my README: https://github.com/db0/lemmy-tagginator/blob/main/README.md
source