It’s known that AI companies will harvest content without care for its veracity and train LLMs on it. These LLMs will then regurgitate that content as fact.
This isn’t a particularly novel finding but the experiment illustrates it rather well.
The researchers you consider to have acted so immorally did add useless information to the knowledge pool – but it was unadvertised, immediately recognizable useless information that any sane reviewer would’ve flagged. They included subtle clues like thanking someone at Starfleet Academy for letting them use a lab aboard the USS Enterprise. They claimed to have gotten funding from the Sideshow Bob Foundation. Subtle.
By providing this easily traceable nonsense, they were able to turn the generally-but-informally known understanding that LLMs will repeat bullshit into a hard scientific data point that others can build on. Nothing world-changing but still valuable. They basically did what Alan Sokal did.
Instead of worrying about this experiment you should worry about all the misinformation in LLMs that wasn’t provided (and diligently documented) by well-meaning researchers.
TheFogan@programming.dev 5 hours ago
I think that’s the problem though, I think the poop in the yard is a better example. Key is the researchers put that information in speculation. That’s like if Anderson Cooper made up a fake news story, and posted it in an anonymous tweet to analyze how far it would spread, and then fox news picks it up and runs with the story all day.
That’s the key problem, people are trusting LLMs to do their research for them, when LLMs just gather all the information they can get their hands on mindlessly.
That’s the key problem, If they send a misinformative article, to a place for untested, unproven random speculation with a very low bar for who can submit… they can determine that LLMs are looking there. Key thing to note is, it’s not their fake disease that’s the threat. It’s that if it found their fake article, then LLMs probably also scooped up a ton of other misinformed or dubious things.
Lets look at it this way, say it was a cake, but we threw it in the garbage, 2 weeks later we find the same cake… at jims bakery, same ID, same distinct marker we put on it.
What does that tell us, it tells us that Jims bakery is clearly sometimes, dumpster diving and putting things up that clearly are dangerous.
chemical_cutthroat@lemmy.world 5 hours ago
That isn’t a fault in the LLM, though, that is a fault in the general make-up of human skepticism, or lack their of. We didn’t invent the word ‘Propaganda’ without having a sentence to use it in. Those that don’t practice skepticism, critical thinking, and even mild reasoning are the ones that will get led astray. That didn’t just start happening when LLMs came around, it’s been here since we first started talking to each other. It’s only more visible now because everything is more visible now. The world is much more connected than it ever has been, and that grows with every literal day. All these fucking idiots that don’t double check what they are being told are the problem, regardless of if it came from an LLM or a human, because I guarantee you they are being led astray by both. They don’t trust the machine because it’s a machine, they trust what they are told because they are lazy. That isn’t the LLMs fault.
TheFogan@programming.dev 5 hours ago
I mean it’s a problem in the marketing and common usage of LLMs. That’s exactly it though, LLM companies, and people are describing LLMs as a way to do research.
IE you could say these criticisms come in things like wikipedia too. IE anyone can write what they want, but what does wikipedia require? right every single claim has to be cited. So if you go to wikipedia find misinformation, you click on the number and see it.
If you ask chatgpt What diseases should I be concerned about in africa, it lists you a few. You can then… google it, find the wikipedia page, and look for what’s there. It’s a tool without a purpose at that point. because it literally doesn’t save you any steps. It doesn’t guide you to the source to check it’s facts, when it tells you them it may or may not be making up the sources. At which point, it has no factual use, or use in even directing to the facts.