Comment

Comment on d wha

Yeah, if words were actually encoded as 1-hot vectors this would be pretty trivial, but the rest of LLM training would be somewhere between infeasible and impossible. The actual embedding vectors obscure spelling even more.

Side note: last time I checked, current embedding vectors were approximately 40 dimensional… Has that gone up significantly in the last couple of years?

source

Sort:hotnew top

Meron35@lemmy.world ⁨2⁩ ⁨months⁩ ago
A fair bit. EmbeddingGemma is open weights and allows for 128-768 dimensions.

It’s not as simple as more dimensions = better, due to size, efficiency, and context rot limitations though.

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings - Google Developers Blog - …googleblog.com/…/introducing-embeddinggemma/

source