1/20th of civitai user prompts added as 300Mb safetensor file to the CLIP interrogator

Submitted ⁨⁨9⁩ ⁨months⁩ ago⁩ by ⁨AdComfortable1514@lemmy.world⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

https://lemmy.world/pictrs/image/78c06a38-0082-48a5-b518-d903cb84f4aa.png

Link to notebook here: huggingface.co/…/fusion_t2i_CLIP_interrogator.ipy…

Image shows list of prompt items before/after running ‘remove duplicates’ from a subset of the Adam Codd huggingface repo of civitai prompts: huggingface.co/datasets/AdamCodd/…/main

Removing duplicates from civitai prompts results in a 90% reduction of items!

From 4.8 million-> 0.417 million items.

If you wish to search this set , you can use the notebook above.

Unlike the typical pharmapsychotic CLIP interrogator , I pre-encode the text corpus ahead of time.

Additionally , I’m using quantization on the text corpus to store the encodings as unsigned integers (torch.uint8) instead of float32 , using this formula:

Image

For the clip encodings , I use scale 0.0043.

The TLDR is that you divide the float32 value with 0.043 , round it up to the closest integer , and then add the zero_point until all values within the encoding is above 0.

A typical zero_point value for a given encoding can be 0 , 30 , 120 or 250-ish.

In summary , a pretty useful setup for me when I need prompts for stuff.

//—//

I also have a 1.6 million item fanfiction set of tags loaded from archiveofourown.org

Its mostly character names.

They are listed as fanfic1 and fanfic2 respectively.

//—//

Upcoming plans is to include a visual representation of the text_encodings as colored cells within a 16x16 grid.

A color is an RGB value (3 integer values) within a given range , and 3 x 16 x 16 = 768 , which happens to be the dimension of the CLIP encoding

So a visual representation of a text_encoding could be shown like this , but with colored cells :

Image

//—//

Thats all for this update.

source

Comments

Sort:hotnew top