This article shows rather well three reasons why I don’t like the term “hallucination”, when it comes to LLM output.
- It’s a catch-all term that describes neither the nature nor the gravity of the problematic output. Failure to address the prompt? False output, fake info? Immoral and/or harmful output? Pasting verbatim training data? Output that is supposed to be moderated against? It’s all “hallucination”.
- It implies that, under the hood, the LLM is “malfunctioning”. It is not - it’s doing what it is supposed to do, to chain tokens through weighted probabilities. Contrariwise to the tech bros’ wishful belief, LLMs do not pick words based on the truth value or morality of the output. That’s why hallucinations won’t go away, at least not for the current architecture of text generators.
- It lumps together those incorrect outputs with what humans would generate on situations of poor reasoning. This “it works like a human” metaphor obscures what happens, instead of clarifying it.
On the main topic of the article. Are LLMs useful? Sure! I use them myself. However only a fool would try to shove LLMs everywhere, with no regards to how intrinsically [yes] unsafe they are. And yet it’s what big tech is doing, regardless of being Chinese or United-Statian or Russian or German or whatever.
HobbitFoot@thelemmy.club 2 months ago
I feel like “hallucination” was chosen as the word because of what it implies.
It doesn’t imply a bad algorithm, which makes the company look bad since hallucinations are out of a person’s control. It doesn’t imply using poor training data for the same reason.
But hallucination also masks the development of the model. A small kid might say something racist based on what they grew up with, but we would likely call that child immature. Same if an AI doesn’t fully understand a question or repeats a wrong answer that was given to it by someone as a laugh.
No one is fault. It was just a hallucination.