Comment on Developer Creates Infinite Maze That Traps AI Training Bots
lvxferre@mander.xyz 3 days ago
This looks interesting. I’d probably combine it with model poisoning - giving each page longer chunks of text, containing bullshit claims and “grammar of slightly brokenness”; so if the data is used to train a model with, the result gets worse.
jherazob@beehaw.org 3 days ago
By now i’ve seen like 6 or 7 projects on either trapping or outright poison content for LLM bots, and yesterday i saw this one which outright modifies the HTML of your page making it harder to steal and instead replaces it with a random prompt, someone asks it to summarize a blogpost and instead the thing starts talking about poodles or something, while normal browser users notice nothing
lvxferre@mander.xyz 3 days ago
Yup, something like this - but for the honeypot, not for the legit pages.
jherazob@beehaw.org 3 days ago
I think many of them already do, i know Iocaine uses Markov chain text generation to spit out nonsense to poison the LLMs, do check it, at the bottom of the project page it links to others