SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨Even_Adder@lemmy.dbzer0.com⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://lemmy.dbzer0.com/pictrs/image/9189dc4d-0430-4983-851b-d1002ed9e9ad.png

Abstract

Recent advancements in diffusion models have positioned them at the forefront of image generation. Despite their superior performance, diffusion models are not without drawbacks; they are characterized by complex architectures and substantial computational demands, resulting in significant latency due to their iterative sampling process. To mitigate these limitations, we introduce a dual approach involving model miniaturization and a reduction in sampling steps, aimed at significantly decreasing model latency. Our methodology leverages knowledge distillation to streamline the U-Net and image decoder architectures, and introduces an innovative one-step DM training technique that utilizes feature matching and score distillation. We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU, respectively. Moreover, our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.

Paper: arxiv.org/abs/2403.16627

Code: github.com/IDKiro/sdxs

Model: huggingface.co/IDKiro/sdxs-512-0.9

Project Page: idkiro.github.io/sdxs/

Image

source

Comments

Sort:hotnew top

bort@sopuli.xyz ⁨1⁩ ⁨year⁩ ago
how does it compare to this? tianweiy.github.io/dmd/

source