Huh that was an interesting read. I’m not sure I understood the entirety of it but it sounds like it would be a lot more efficient than what I was thinking
Huh that was an interesting read. I’m not sure I understood the entirety of it but it sounds like it would be a lot more efficient than what I was thinking
morhp@lemmy.wtf 1 year ago
Yes, it’s very efficient and the core of what complession formats like .zip do.
The main difference to your idea is that computers count in binary like 0, 1, 10, 11, 100, 101 and so on and that you don’t want to assign these very low codes words directly. Say you’d have assigned 1 the most common word, then that will would be encoded very short, but you’d sort of take the one away from all the other codes, as you don’t know if 11 is twice the most common word or once the 11th (3rd in decimal) common word.
Huffman essentially computes the most optimal word assignments mathematically.
The main other difference between your suggestion and most compression algorithms is that you wouldn’t use a huge dictionary in real time and loading it and looking into it would be very slow. Most compression algorithms have a rather small dictionary buildin and/or they build one on the fly looking at the data that they want to compress.