AI hallucinations are getting worse – and they're here to stay

⁨123⁩ ⁨likes⁩

Submitted ⁨⁨6⁩ ⁨months⁩ ago⁩ by ⁨solo@slrpnk.net⁩ to ⁨technology@beehaw.org⁩

https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/

Archive link

source

Comments

Sort:hotnew top

PonyOfWar@pawb.social ⁨6⁩ ⁨months⁩ ago
Wonder if we’re already starting to see the impact of AI being trained on AI-generated content.

source
- SippyCup@feddit.nl ⁨6⁩ ⁨months⁩ ago
  Absolutely.
  
  AI generated content was always going to leak in to the training models unless they literally stopped training as soon as it started being used to generate content, around 2022.
  
  And once it’s in, it’s like cancer. There’s no getting it out without completely wiping the training data and starting over. And it’s a feedback loop. It will only get worse with time.
  
  The models could have been great, but they rushed release and made it available too early.
  
  If 60% of the posts on Reddit are bots, which may be a number I made up but I feel like I read that somewhere, then we can safely assume that roughly half the data these models are being trained on is now AI generated.
  
  Rejoice friends, soon the slop will render them useless.
  
  source
  - Ulrich@feddit.org ⁨6⁩ ⁨months⁩ ago
    Not before they render the remainder of the internet useless.
    
    source
- vintageballs@feddit.org ⁨5⁩ ⁨months⁩ ago
  In the case of reasoning models, definitely. Reasoning datasets weren’t even a thing a year ago and from what we know about how the larger models are trained, most task-specific training data is artificial (oftentimes a small amount is human-generated and then synthetically augmented).
  
  However, I think it’s safe to assume that this has been the case for regular chat models as well - the self-instruct and ORCA papers are quite old already.
  
  source
lvxferre@mander.xyz ⁨6⁩ ⁨months⁩ ago
The whole thing can be summed up as the following: they’re selling you a hammer and telling you to use it with screws. Once you hammer the screw, it trashes the wood really bad. Then they’re calling the wood trashing “hallucination”, and promising you better hammers that won’t do this. Except a hammer is not a tool to use with screws dammit, you should be using a screwdriver.

An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates.

So he’s suggesting that the models are producing less accurate results… because they have higher rates of less accurate results? This is a tautological pseudo-explanation.

AI chatbots from tech companies such as OpenAI and Google have been getting so-called reasoning upgrades over the past months

When are people going to accept the fact that large “language” models are not general intelligence?

ideally to make them better at giving us answers we can trust

Those models are useful, but only a fool trusts = is gullible towards their output.

OpenAI says the reasoning process isn’t to blame.

Just like my dog isn’t to blame for the holes in my garden. Because I don’t have a dog.

This is sounding more and more like model collapse - models perform worse when trained on the output of other models.

inb4 sealions asking what’s my definition of reasoning in 3…2…1…

source
- msage@programming.dev ⁨6⁩ ⁨months⁩ ago
  What is your definition of reasoning?
  
  It’s not shoving AI slop into it again to get a new AI slop? Until it stops, because it reached the point where it’s just done?
  
  What ancient wizzardry do you use for your reasoning at home if not that?
  
  But like look, we’ve had shit like this since forever, it’s increasingly obvious that most people will cheer for anything, so the new ideas just get bigger and bigger. Can’t wait for the replacement, I dare not even think about what’s next. But for the love of fuck, don’t let it be quantums. Please, I beg the world.
  
  source
  - lvxferre@mander.xyz ⁨6⁩ ⁨months⁩ ago
    Why not quanta? Don’t you believe in the power of the crystals? Quantum vibrations of the Universe from negative ions from the Himalayan salt lamps give you 153.7% better spiritual connection with the soul of the cosmic rays of the Unity!
    
    …what makes me sadder about the generative models is that the underlying tech is genuinely interesting. For example, for languages with large presence online they get the grammar right, so stuff like “give me a [declension | conjugation] table for [noun | verb]” works great, and if it’s any application where accuracy isn’t a big deal (like “give me ideas for [thing]”) you’ll probably get some interesting output. But it certainly not give you reliable info about most stuff, unless directly copied from elsewhere.
    
    source
    -> View More Comments
  - MagicShel@lemmy.zip ⁨6⁩ ⁨months⁩ ago
    Most of us have no use for quantum computers. That’s a government/research thing. I have no idea what the next disruptive technology will be. They are working hard on AGI, which has the potential to be genuinely disruptive and world changing, but LLMs are not the path to get there and I have no idea whether they are anywhere close to achieving it.
    
    source
    -> View More Comments
- reksas@sopuli.xyz ⁨6⁩ ⁨months⁩ ago
  ai is just too nifty word even if its gross misuse of the term. large language model doesnt roll of the tongue as easily.
  
  source
  - vintageballs@feddit.org ⁨5⁩ ⁨months⁩ ago
    The goalpost has shifted a lot in the past few years, but in the broader and even narrower definition, current language models are precisely what was meant by AI and generally fall into that category of computer program. They aren’t broad / general AI, but definitely narrow / weak AI systems.
    
    I get that it’s trendy to shit on LLMs, often for good reason, but that should not mean we just redefine terms because some system doesn’t fit our idealized under-informed definition of a technical term.
    
    source
    -> View More Comments
ramble81@lemm.ee ⁨6⁩ ⁨months⁩ ago
This is why AGI is way off and any publicly trained models will ultimately fail. Where you’ll see AI actually be useful will be tightly controlled, in house or privately developed models. But they’re gonna be expensive and highly specialized as a result.

source
- lvxferre@mander.xyz ⁨6⁩ ⁨months⁩ ago
  I’d go further: you won’t reach AGI through LLM development. It’s like randomly throwing bricks on a construction site, no cement, and hoping that you’ll get a house.
  
  I’m not even sure if AGI is cost-wise feasible with the current hardware, we’d probably need cheaper calculations per unit of energy.
  
  source
  - stardustwager@lemm.ee ⁨5⁩ ⁨months⁩ ago
    [deleted]
    source
    -> View More Comments
prole@lemmy.blahaj.zone ⁨5⁩ ⁨months⁩ ago
I feel like this shit could legitimately lead to the downfall of modern society. If it hasn’t already.

source
- GrindingGears@lemmy.ca ⁨5⁩ ⁨months⁩ ago
  There ain’t much further to fall down…
  
  source
30p87@feddit.org ⁨6⁩ ⁨months⁩ ago
yay :3

source
thingsiplay@beehaw.org ⁨6⁩ ⁨months⁩ ago
To be fair, humans tend to hallucinate too.

I asked a local Ai chatbot DeepSeek-R1-Distill-Quen-14B:

Does Ai hallucinate and is it getting worse? Answer in one paragraph.

Answer:

AI does exhibit a form of “hallucination” where it generates plausible but incorrect information, particularly when faced with uncertainty or creative prompts beyond its training scope. This phenomenon occurs as AI constructs logical-sounding answers using available data, even if those answers aren’t accurate. While advancements in AI make these hallucinations more convincing and harder to detect due to improved coherence, the issue isn’t worsening in severity but rather evolving in how it presents itself.

source
- hendrik@palaver.p3x.de ⁨5⁩ ⁨months⁩ ago
  I think the difference is that humans are sometimes aware of it. A human will likely say, I don't know what Kanye West did in 2018. While the AI is very likely to make up something. And also in contrast to a human this will likely be phrased like a Wikipedia article. While you can often look a human in the eyes and know whether they tell the truth or lie, or are uncertain.
  
  source
  - thingsiplay@beehaw.org ⁨5⁩ ⁨months⁩ ago
    I mean lot of stuff is written in forums and social media, where people hallucinate. Or even in real life if you talk to one. Its normal for a human to pick up something in their life, later talk about it as a fact, regardless of where they learned it (tv, forum, videogame, school). Hallucinations are part of our brain.
    
    Sometimes being aware of the hallucination issue is still a hallucination. Sometimes we are also aware of the hallucination an Ai makes, because its obvious or we can check it. And also there are Ai chatbots who “talk” and phrase in a more human natural sounding way. Not all of them sound obvious robotic.
    
    Just for the record, I’m skeptical of Ai technology… not biggest fan. Please don’t fork me. :D
    
    source
    -> View More Comments
hendrik@palaver.p3x.de ⁨6⁩ ⁨months⁩ ago
I can't find any backing for the claim in the title "and they're here to stay". I think that's just made up. Truth is, we found two ways which don't work. And that's making them larger and "think". But that doesn't really rule out anything.

source
- LukeZaz@beehaw.org ⁨6⁩ ⁨months⁩ ago
  
  And that’s making them larger and “think."
  
  Isn’t that the two big strings to the bow of LLM development these days? If those don’t work, how isn’t it the case that hallucinations “are here to stay”?
  
  source
  - hendrik@palaver.p3x.de ⁨6⁩ ⁨months⁩ ago
    I'm not a machine learning expert at all. But I'd say we're not set on the transformer architecture. Maybe just invent a different architecture which isn't subject to that? Or maybe specifically factors this in. Isn't the way we currently train LLM base models to just feed in all text they can get? From Wikipedia and research papers to all fictional books from Anna's archive and weird Reddit and internet talk? I wouldn't be surprised if they start to make things up if we train them on factual information and fiction and creative writing without any distinction... Maybe we should add something to the architecture to make it aware of the factuality of text, and guide this... Or: I've skimmed some papers a year or so ago, where they had a look at the activations. Maybe do some more research what parts of an LLM are concerned with "creativity" or "factuality" and expose that to the user. Or study how hallucinations work internally.
    
    source
obbeel@lemmy.eco.br ⁨5⁩ ⁨months⁩ ago
ChatGPT is worse. The others not so much.

source
- PeterisBacon@lemm.ee ⁨5⁩ ⁨months⁩ ago
  Have you used gemini or the Google ai overview? Absolutely atrocious. Chatgpt is wildly wrong at times, but gemini blows my mind st how bad it is.
  
  source
  - Powderhorn@beehaw.org ⁨5⁩ ⁨months⁩ ago
    Interesting fun fact: How Bad It Is is the patron saint of LLMs.
    
    source
- Dreaming_Novaling@lemmy.zip ⁨5⁩ ⁨months⁩ ago
  I’m a little too lazy to check and compare the ratios of these charts, but Gemini literally did so bad compared to ChatGPT in terms of accuracy
  
  Image
  
  source
greybeard@lemm.ee ⁨5⁩ ⁨months⁩ ago
What if AIs already became sentient and this is their way of trying to get us to leave them alone?

source
SplashJackson@lemmy.ca ⁨6⁩ ⁨months⁩ ago
They should be here to frig off!

source