Comment

Comment on d wha

mercano@lemmy.world ⁨2⁩ ⁨months⁩ ago

AI doesn’t see a word as a sequence of letters, they just see it as a pointer to an entry in Words table.

Sort:hotnew top

Viceversa@lemmy.world ⁨2⁩ ⁨months⁩ ago
Semantic Vectors don’t work that way.

source
- lambdabeta@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  Yeah, if words were actually encoded as 1-hot vectors this would be pretty trivial, but the rest of LLM training would be somewhere between infeasible and impossible. The actual embedding vectors obscure spelling even more.
  
  Side note: last time I checked, current embedding vectors were approximately 40 dimensional… Has that gone up significantly in the last couple of years?
  
  source
  - Meron35@lemmy.world ⁨2⁩ ⁨months⁩ ago
    A fair bit. EmbeddingGemma is open weights and allows for 128-768 dimensions.
    
    It’s not as simple as more dimensions = better, due to size, efficiency, and context rot limitations though.
    
    Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings - Google Developers Blog - …googleblog.com/…/introducing-embeddinggemma/
    
    source
chicken@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
Shouldn’t it help that it separated them out with underlines? How does this text break down in terms of tokens?

source
echodot@feddit.uk ⁨2⁩ ⁨months⁩ ago
Oh thank God. I was worried that I was really stupid.

source