I f you believe AI companies should NOT be allowed to train AI with copyrighted works you should stop using Internet search engines. Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.
sorry but no. most search bots have been for years quite reasonable in following instructions from the sites on what to scrap and what not. AI scrappers have shown they are willing to go to great lenghts to scrap content against the wishes of the website owners.
Additionally this scrapping has shown to put a tremendous amount of problems into some sites and platforms, open source projects, for example.
Last, search engines are a win-win. They get to show ads and then redirect traffic to the source. LLMs for the most part steal that traffic, by regurgitating the same content they stole in the first place.
riskable@programming.dev 19 hours ago
There’s no legal distinction between what a search engine scraper does and what an AI scraper does. They’re literally the same exact fucking thing.
Google scrapes a site and puts it in their database.
An AI company scrapes a site and puts it in their database.
You’re trying to make a distinction based on what happens after the data has been collected and completely ignoring the fact that search engine scrapers and AI scrapers are performing the same activity.
In fact it’s everyone’s right to scrape the Internet! Go, scrape it! Whether it’s Google or AI companies or Joe Schmoe. Scraping is legal and a perfectly normal activity. Just because some AI scrapers are fucking up has no bearing on whether or not the activity of scraping is bad/good.
We learned this lesson in the 90s: If you don’t want someone scraping your stuff don’t put it on the fucking Internet!