Comment

Comment on AI hallucinations are impossible to eradicate — but a recent, embarrassing malfunction from one of China’s biggest tech firms shows how they can be much more damaging there than in other countries

<- View Parent

AndrasKrigare@beehaw.org ⁨1⁩ ⁨year⁩ ago

I think to some extent it’s a matter of scale, though. If I advertise something as a calculator capable of doing all math, and it can only do one problem, it is so drastically far away from its intended purpose that the meaning kinda breaks down. I don’t think it would be wrong to say “it malfunctions in 99.999999% of use cases” but it would be easier to say that it just doesn’t work.

Continuing (and torturing) that analogy, if we did the disgusting work of precomputing all 2 number math problems for integers from -1,000,000 to 1,000,000 and I think you could say you had a (really shitty and slow) calculator, which “malfunctions” for numbers outside that range if you don’t specify the limitation ahead of time. Not crazy different from software which has issues with max_int or small buffers.

If it were the case that there had only been one case of a hallucination with LLMs, I think we could pretty safely call that a malfunction (and we wouldn’t be having this conversation). If it happens 0.000001% of the time, I think we could still call it a malfunction and that it performs better than a lot of software. 99.999% of the time, it’d be better to say that it just doesn’t work. I don’t think there is, or even needs to be, some unified understanding of where the line is between them.

Really my point is there are enough things to criticize about LLMs and people’s use of them, this seems like a really silly one to try and push.

source

Sort:hotnew top

lvxferre@mander.xyz ⁨1⁩ ⁨year⁩ ago

Really my point is there are enough things to criticize about LLMs and people’s use of them, this seems like a really silly one to try and push.

The comment that you’re replying to is fairly specifically criticising the usage of the word “hallucination” to misrepresent the nature of the undesirable LLM output, in the context of people selling you stuff by what it is not.

It is not “pushing” another “thing to criticise about LLMs”. OK? I have my fair share of criticism against LLMs themselves, but that is not what I’m doing right now.

Continuing (and torturing) that analogy, […] max_int or small buffers.

When we extend analogies they often break in the process. That’s the case here.

Originally the analogy works because it shows a phony selling a product by what it is not. By making the phony to precompute 4*10¹² equations (a completely unrealistic situation), he stops being a phony to become a muppet doing things the hard way.

If it were the case that there had only been one case of a hallucination with LLMs, I think we could pretty safely call that a malfunction

If it happens 0.000001% of the time, I think we could still call it a malfunction and that it performs better than a lot of software.

Emphases mine. Those “ifs” represent a completely unrealistic situation, that does not show anything useful about the real situation.

We know that LLMs output “hallucinations” way more than just once, or 0.000001% of the time. They’re common enough to show you how LLMs work.

source