Comment on Hexadecimal
morrowind@lemmy.ml 2 weeks agoNot really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte
Comment on Hexadecimal
morrowind@lemmy.ml 2 weeks agoNot really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte
GissaMittJobb@lemmy.ml 2 weeks ago
It is a concern.
Check out tiktokenizer.vercel.app/?model=deepseek-ai%2FDeep… and try entering some freeform hexadecimal data - you’ll notice that it does not cleanly segment the hexadecimal numbers into individual tokens.
morrowind@lemmy.ml 2 weeks ago
I’m well aware, but you don’t need to necessarily see each character to translate to bytes
GissaMittJobb@lemmy.ml 2 weeks ago
It’s not out of the question that we get emergent behaviour where the model can connect non-optimally mapped tokens and still translate them correctly, yeah.
kautau@lemmy.world 2 weeks ago
I’m confused, is the concern when the model doesn’t properly identify when it is using software to identify something like a hex pattern?