SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing

Submitted ⁨⁨2⁩ ⁨years⁩ ago⁩ by ⁨Even_Adder@lemmy.dbzer0.com⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://i.imgur.com/jX0G4Vr.png

Abstract

Text-conditional image editing based on large diffusion generative model has attracted the attention of both the industry and the research community. Most existing methods are non-reference editing, with the user only able to provide a source image and text prompt. However, it restricts user’s control over the characteristics of editing outcome. To increase user freedom, we propose a new task called Specific Reference Condition Real Image Editing, which allows user to provide a reference image to further control the outcome, such as replacing an object with a particular one. To accomplish this, we propose a fast baseline method named SpecRef. Specifically, we design a Specific Reference Attention Controller to incorporate features from the reference image, and adopt a mask mechanism to prevent interference between editing and non-editing regions. We evaluate SpecRef on typical editing tasks and show that it can achieve satisfactory performance. The source code is available on this https URL.

Paper: arxiv.org/abs/2401.03433

Code: github.com/jingjiqinggong/specp2p

Image

source

Comments

Sort:hotnew top

tagginator@utter.online [bot] ⁨2⁩ ⁨years⁩ ago
New Lemmy Post: SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing (https://lemmy.dbzer0.com/post/12054265)
Tagging: #StableDiffusion
(Replying in the OP of this thread (NOT THIS BOT!) will appear as a comment in the lemmy discussion.)
I am a FOSS bot. Check my README: https://github.com/db0/lemmy-tagginator/blob/main/README.md
source