MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

Submitted ⁨⁨2⁩ ⁨years⁩ ago⁩ by ⁨Even_Adder@lemmy.dbzer0.com⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://i.imgur.com/jMDFsdj.png

Abstract

The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and slow inference speed. In this paper, we propose \textbf{MobileDiffusion}, a highly efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to reduce redundancy, enhance computational efficiency, and minimize model’s parameter count, while preserving image generation quality. Additionally, we employ distillation and diffusion-GAN finetuning techniques on MobileDiffusion to achieve 8-step and 1-step inference respectively. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed techniques. MobileDiffusion achieves a remarkable sub-second inference speed for generating a 512×512 image on mobile devices, establishing a new state of the art.

Paper: arxiv.org/abs/2311.16567

Image

source

Comments

Sort:hotnew top

tagginator@utter.online [bot] ⁨2⁩ ⁨years⁩ ago
New Lemmy Post: MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices (https://lemmy.dbzer0.com/post/13491532)
Tagging: #StableDiffusion
(Replying in the OP of this thread (NOT THIS BOT!) will appear as a comment in the lemmy discussion.)
I am a FOSS bot. Check my README: https://github.com/db0/lemmy-tagginator/blob/main/README.md
source