Comment on What is the Anti Commercial-Al license and why do people keep adding it to their comments?
sushibowl@feddit.nl 7 months agoIt would be pretty funny if GPT starts putting licence notices under its answers because that’s what people do in its training data.
hsr@lemmy.dbzer0.com 7 months ago
Until now I was under the impression that this was the goal of these notices:
Because if an LLM ingests a comment with a copyright notice like that, there’s a chance it will start appending copyright notices to it’s own responses, which could technically, legally, maybe make the AI model CC BY-NC-SA 4.0? A way to “poison” the dataset?
(also, I have no clue about copyrights)
cley_faye@lemmy.world 7 months ago
Your first mistake was thinking the company training their models care. They’re actively lobbying for the right to say “fuck copyright when it benefits us!”.
Your second mistake is assuming training LLM blindly put everything in. There’s human filters, then there’s automated filters, then there’s the LLM itself that blur things out. I can’t tell about the last one, but the first two will easily strip such easy noise, the same way search engines very quickly became immune to random keyword spam two decades ago.
Note that I didn’t even care to see if it was useful in any way to add these little extra blurb, legally speaking. I doubt it would help, though. Service ToS and other regulatory body have probably more weight than that.