This is an open ended question.
I’m not looking for a specific answer , just what people know about this topic.
I’ve asked this question on Huggingface discord as well.
But hey, asking on lemmy is always good, right? Plus this might serve as an “update” of sorts from the previous post.
//—//
Question;
FLUX model uses a combo of CLIP+T5 to create a text_encoding.
CLIP is capable if doing both image_encoding and text_encoding.
T5 model seems to be strictly text-to-text.
So I can’t use the T5 to create image_encodings. Right?
huggingface.co/docs/transformers/model_doc/t5
But nonetheless, the T5 encoder is used in text-to-image generation.
So surely, there must be good uses for the T5 in creating a better CLIP interrogator?
Ideas/examples on how to do this?
I have 0% knowledge on the T5 , so feel free to just send me a link someplace if you don’t want to type out an essay.
//----//
For context;
I’m making my own version of a CLIP interrogator : colab.research.google.com/#fileId=https%3A//huggi…
Key difference is that this one samples the CLIP-vit-large-patch14 tokens directly instead of using pre-written prompts.
I text_encode the tokens individually , store them in a list for later use.
I’m using the method shown in this paper, the “NND-Nearest neighbor decoding” .
Methods for making better CLIP interrogators: arxiv.org/pdf/2303.03032
T5 encoder paper : arxiv.org/pdf/1910.10683
Example from the notebook where I’m using the NND method on 49K CLIP tokens (Roman girl image) :
Most similiar suffix tokens : “{vfx |cleanup |warcraft |defend |avatar |wall |blu |indigo |dfs |bluetooth |orian |alliance |defence |defenses |defense |guardians |descendants |navis |raid |avengersendgame }”
most similiar prefix tokens : “{imperi-|blue-|bluec-|war-|blau-|veer-|blu-|vau-|bloo-|taun-|kavan-|kair-|storm-|anarch-|purple-|honor-|spartan-|swar-|raun-|andor-}”
erenkoylu@lemmy.ml 2 months ago
T5 is a very mediocre LM. why do people keep using it?
AdComfortable1514@lemmy.world 2 months ago
Hmm. I mean the FLUX model looks good
, so there must maybe be some magic with it ?
T5 Huggingface: huggingface.co/docs/transformers/model_doc/t5
T5 paper : arxiv.org/pdf/1910.10683
Any suggestions on what LLM i ought to use instead of T5?
erenkoylu@lemmy.ml 2 months ago
aya based llms are extremely powerful. same with qwen