DIFFMORPH: TEXT-LESS IMAGE MORPHING WITH DIFFUSION MODELS

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨Even_Adder@lemmy.dbzer0.com⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://i.imgur.com/lolGRqs.png

#ABSTRACT

Text-conditioned image generation models are a prevalent use of AI image synthesis, yet intuitively controlling output guided by an artist remains challenging. Current methods require multiple images and textual prompts for each object to specify them as concepts to generate a single customized image. On the other hand, our work, DiffMorph, introduces a novel approach that synthesizes images that mix concepts without the use of textual prompts. Our work integrates a sketch-to-image module to incorporate user sketches as input. DiffMorph takes an initial image with conditioning artist-drawn sketches to generate a morphed image. We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully. We seamlessly merge images and concepts from sketches into a cohesive composition. The image generation capability of our work is demonstrated through our results and a comparison of these with prompt-based image generation.

Paper: arxiv.org/abs/2401.00739

Image

source

Comments

Sort:hotnew top

tagginator@utter.online [bot] ⁨1⁩ ⁨year⁩ ago
New Lemmy Post: DIFFMORPH: TEXT-LESS IMAGE MORPHING WITH DIFFUSION MODELS (https://lemmy.dbzer0.com/post/11557680)
Tagging: #StableDiffusion
(Replying in the OP of this thread (NOT THIS BOT!) will appear as a comment in the lemmy discussion.)
I am a FOSS bot. Check my README: https://github.com/db0/lemmy-tagginator/blob/main/README.md
source