Comment on OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws

Technus@lemmy.zip ⁨3⁩ ⁨hours⁩ ago

Beyond proving hallucinations were inevitable, the OpenAI research revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA, MMLU-Pro, and SWE-bench, found nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers.

“We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty,” the researchers wrote.

I just wanna say I called this out nearly a year ago: lemmy.zip/comment/13916070

source
Sort:hotnewtop