The Firefly image generator is a diffusion model, and the Firefly video generator is a diffusion transformer. LLMs aren’t involved in either process. I believe there are some ChatGPT integrations with Reader and Acrobat, but that’s unrelated to Firefly.
As I understand it, CLIP (and other text encoders in diffusion models) aren’t trained like LLMs, exactly. They’re trained on image/text pairing, which ya get from the metadata creators upload with their photos in Adobe Stock. That said, Adobe hasn’t published their entire architecture.
utopiah@lemmy.world 1 day ago
Does it only use that or doesn’t it also use an LLM to?
Hackworth@piefed.ca 1 day ago
The Firefly image generator is a diffusion model, and the Firefly video generator is a diffusion transformer. LLMs aren’t involved in either process. I believe there are some ChatGPT integrations with Reader and Acrobat, but that’s unrelated to Firefly.
utopiah@lemmy.world 17 hours ago
Surprising, I would expect it’d rely at some point on something like CLIP in order to be prompted.
Hackworth@piefed.ca 3 hours ago
As I understand it, CLIP (and other text encoders in diffusion models) aren’t trained like LLMs, exactly. They’re trained on image/text pairing, which ya get from the metadata creators upload with their photos in Adobe Stock. That said, Adobe hasn’t published their entire architecture.