riskable
@riskable@programming.dev
Father, Hacker (Information Security Professional), Open Source Software Developer, Inventor, and 3D printing enthusiast
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 15 hours ago:
There’s no legal distinction between what a search engine scraper does and what an AI scraper does. They’re literally the same exact fucking thing.
Google scrapes a site and puts it in their database.
An AI company scrapes a site and puts it in their database.
You’re trying to make a distinction based on what happens after the data has been collected and completely ignoring the fact that search engine scrapers and AI scrapers are performing the same activity.
In fact it’s everyone’s right to scrape the Internet! Go, scrape it! Whether it’s Google or AI companies or Joe Schmoe. Scraping is legal and a perfectly normal activity. Just because some AI scrapers are fucking up has no bearing on whether or not the activity of scraping is bad/good.
We learned this lesson in the 90s: If you don’t want someone scraping your stuff don’t put it on the fucking Internet!
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 15 hours ago:
A VCR spits out copyrighted material because the user put it there (by inserting a cassette). Just like how an AI can spit out something that resembles copyrighted works when a user asks it to do so.
It’s not a perfect analogy but it fits. Especially from a legal perspective.
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 1 day ago:
Many of the AI companies have been downloading content through means other than scraping, such as bittorrent, to access and compile copyrighted data that is not publicly scrape-able. That includes Meta, OpenAI, and Google.
Anthropic is the only company to have admitted publicly to doing this. They were sued and settled out of court. Google and OpenAI have had no such accusations as far as I’m aware. Furthermore, Google had the gigantic book scanning project where it was determined in court that the act of scanning as many fucking books as you want is perfectly legal (fair use). Read all about it: en.wikipedia.org/…/Authors_Guild,_Inc._v._Google,….
In late 2013, after the class action status was challenged, the District Court granted summary judgment in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court’s summary judgment in October 2015, ruling Google’s “project provides a public service without violating intellectual property law.” The U.S. Supreme Court subsequently denied a petition to hear the case.
You say:
That is also false. Just because you don’t understand the legal distinction between scraping content to summarize in order to direct people to a site (there was already a lawsuit against Google that established this, as well as its boundaries), versus scraping content to generate a replacement that obviates the original content, doesn’t mean the law doesn’t understand it.
There is no such legal distinction. Scraping content is legal no matter WTF you plan to do with it. This has been settled in court many, many times. Here’s some court cases for you to learn the actual legality of scraping and storing of said scraped data:
- en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn (recent!)
- en.wikipedia.org/wiki/Van_Buren_v._United_States — This one is really relevant because it reaffirmed the concept of, “if you put it on the Internet, anyone can download it; that’s fair use… Regardless of intent”… just because you put up some acceptable use policy that says, “you may not scrape this”—or use it to train AI—is irrelevant. You have to read the full ruling to get all those details though (sorry).
- en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corp. and en.wikipedia.org/…/Perfect_10,_Inc._v._Amazon.com…. — These are the big court cases that determined that what search engines do “is transformative” and therefore, fair use. The courts ruled that summarizing (thumbnails or generating text) what you scrape is fair use. By extension, one can assume that what an AI does with the scraped data is the same thing. Remember: At the heart of every AI model is a random number generator. They’re not “plagiarism machines” as many would like to believe. They’re random word prediction engines.
- en.wikipedia.org/wiki/Field_v._Google,_Inc. — This is the court case that ruled—very specifically!—that caching the entire copyrighted works in your database is perfectly legal.
To summarize all this: You are 100% wrong. I have cited my sources. I was there (“3000 years ago…”) when all this went down. Pepperidge Farm remembers.
You say:
And none of this matters, because AI companies aren’t just reading content, they’re taking it and using it for commercial purposes.
This is a common misconception of copyright law: Remember Napster? They were sued and argued in court that because users don’t profit from sharing songs with their friends, it is legal. The court rejected this argument: en.wikipedia.org/…/A%26M_Records,_Inc._v._Napster…. See also: en.wikipedia.org/…/Capitol_Records,_Inc._v._Thoma… and en.wikipedia.org/…/Harper_%26_Row_v._Nation_Enter… and en.wikipedia.org/…/American_Geophysical_Union_v._…. where the courts all ruled the same way.
You say:
Perhaps you are unaware, but (at least in the US) while it is legal for you to view a video on YouTube, if you download it for offline use that would constitute copyright infringement if the owner objects. The video being public does not grant anyone and everyone the right to use it however they wish. Ditto for something like making an mp3 of a song on Spotify using Audacity.
Downloading a Youtube video for offline use is legal… Depending on the purpose. This is one of those very, very nuanced areas of copyright law where fair use intersects with the DMCA and also intersects with the CFAA. The DMCA states, “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.” Since Youtube videos have some technical measures to prevent copying (depending on the resolution and platform!), it is illegal to circumvent them. However, The Librarian of Congress can grant exceptions to this rule and has done so for many situations. For example, archiving (arl.org/…/librarian-of-congress-expands-dmca-exem…) which is just plain wacky, IMHO.
Regardless, if Youtube didn’t put an anti-circumvention mechanism into their videos it would be perfectly legal to download the videos. Just like it’s legal to record TV shows with a VCR. This was ruled in Sony Corp. of America v. Universal City Studios (already cited). There’s no reason why it wouldn’t still apply to Youtube videos. The fact that no one has been sued for doing this since then (that I could find) seems to indicate that this is a very settled thing.
You say:
no one is arguing that it is theft, they are arguing that it is copyright infringement, which is what all of us are also subject to under the DMCA. So we’re actually arguing that AI companies should be held to the same standard that we are.
No. Fuck no. A shittton of people are saying it’s “theft”. Have you been on the Internet recently? LOL! I see it every damned day and I’m sick of it. I repeat myself that, “it’s not theft, it’s copyright infringement” and I get downvoted for “being pedantic”. Like it’s not a very fucking important distinction!
…but also: What an AI model does isn’t copyright infringement (usually). You ask it to generate an image or some text and it just does what you ask it to do. The fact that it’s possible for it to infringe copyright shouldn’t matter because it’s just a tool like a Xerox machine/copier. It has already been ruled fair use for an AI company to train their models with copyrighted works (great summary of that here: debevoise.com/…/anthropic-and-meta-decisions-on-f… ). Despite these TWO court rulings, people are still saying that training AI models is both “theft” and somehow “illegal”. We’re already past that.
AI models are terrible copyright violators! Everything they generate—at best—can only ever be, “kinda sorta like” a copyrighted work. You can get closer and closer if you get clever with prompts and tell the model to generate say, 10000 images of the same thing. Then you can look at your prayers to the RNG gods and say, “Aha! Look! This image looks very very similar to Indiana Jones!”
You say:
Also, note that AI companies have argued in court (in the case brought by Steven King et al) that their use of copyrighted material shouldn’t fall under DMCA at all (i.e. arguing that it’s not about Fair Use), because their argument is that AI training is not the ‘intended use’ of the source material, so this is not eating into that commercial use. That argument leaves copyright infringement liability intact for the rest of us, while solely exempting them from liability. No thanks.
Luckily, them arguing they’re apart and separate from Fair Use also means that this can be rejected without affecting Fair Use! Double-win!
Where TF did you see this? I did some searching and I cannot see anything suggesting that the AI companies have rejected any kind of DMCA protection.
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 1 day ago:
If you believe AI companies should NOT be allowed to train AI with copyrighted works you should stop using Internet search engines. Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.
Every day, Google and others download huge swaths of the Internet directly into their servers and nobody bats an eye. An AI company does the same thing and now people say that’s copyright infringement.
What the fuck! I don’t get it. It’s the exact same thing. Why is an AI company doing that any different‽
It’d be one thing if people were bitching about just the output of AI models but they’re not. They’re bitching about the ingress step!
The day we ban ingress of copyrighted works into whatever TF people want is the day the Internet stops working.
My comment right here is copyrighted. So is yours! I didn’t ask your permission before my Lemmy client downloaded it. I don’t need to ask your permission to use your comment however TF I want until I distribute it. That’s how the law works. That’s how it’s always worked.
The DMCA also protects the sites that host Lemmy instances from copyright lawsuits. Because without that, they’d be guilty of distribution of copyrighted works without the owner’s permission every damned day.
People who hate AI are supporting an argument that the movie and music studios made in the 90s: That “downloading is theft.” It is not! In fact, because that is not theft, we’re all able to enjoy the Internet every day.
Ever since the Berne convention, literally everything is copyrighted. Everything.
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 1 day ago:
That “blank tape charge” was only implemented in Canada. Not the US.
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 1 day ago:
Your VCR is hooked up to your TV (coaxial into the VCR and from the VCR into the TV). Just before the broadcast starts, you press the record button (which was often mechanically linked to the play button). When it’s done, you press stop. Then you rewind and can play it back later.
The end user is sort of passively recording it. The broadcast happens regardless of the user’s or the VCR’s presence.
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 1 day ago:
You obviously never used a VCR to record a live broadcast before. When people were using a VCR to record things, that’s what they were doing 99% of the time. Nobody had two VCRs hooked up to each other to copy tapes. That was a super rare situation that you’d typically only find in professional studios.
- Comment on Square Enix, Bandai, and other Japanese studios demand OpenAI stop using their content without permission, drop a not-too-subtle hint about legal trouble if it doesn't 2 days ago:
Imagine you have a magic box that can generate any video you want. Some people ask it to generate fan fiction-like videos, some ask it to generate meme-like videos, and a whole lot of people ask it to generate porn.
Then there’s a few people that ask it to generate videos using trademarked and copyrighted stuff. It does what the user asks because there’s no way for it to know what is and isn’t copyrighted. What is and isn’t parody or protected fair use.
It’s just a magic box that generates videos… Whatever the human asks for.
This makes some people and companies very, very upset. They sue the maker of the magic box, saying it’s copying their works. They start PR campaigns, painting the magic box in a bad light. They might even use the magic box quite a lot themselves but it doesn’t matter. To them, the magic box is pure evil; indirectly preventing them from gaining more profit… Somehow. Just like Sony was sued for making a machine that let people copy whatever videos they wanted (en.wikipedia.org/…/Sony_Corp._of_America_v._Unive….).
Before long, other companies make their own magic boxes and then, every day people get access to their own, personal magic boxes that no one can see the output from unless they share.
Why is the different from the Sony vs Universal situation? The AI magic box is actually worse at copying videos than a VCR.
When a person copies—and then distributes—a movie do we say the maker of the VCR/DVD burner/computer is at fault for allowing this to happen? No. It’s the person that distributed the copyrighted work.
- Comment on Trial Begins for Man Accused of Lobbing a Sandwich at a Federal Agent 2 days ago:
This administration hates heroes!
- Comment on Nearly 90% of Windows Games now run on Linux, latest data shows — as Windows 10 dies, gaming on Linux is more viable than ever 1 week ago:
FYI: That’s more Windows games than run in Windows!
WTF? Why? Because a lot of older games don’t run in newer versions of Windows than when they were made! They still run great in Linux though 👍
- Comment on Wikipedia Says AI Is Causing a Dangerous Decline in Human Visitors 2 weeks ago:
It’s possible that a huge number of sites that are currently free will turn to paywalls but that won’t make the local AI useless. You’ll just give it your login credentials for any sites you want it to search and it’ll do it’s thing (and no, captias don’t work with AI models… They’re only good at stopping basic crawlers).
- Comment on Wikipedia Says AI Is Causing a Dangerous Decline in Human Visitors 2 weeks ago:
I’ve been saying this for some time now: AI is going to kill so many business models because it’s really great at creating summaries.
You don’t even need a huge cloud-basrd AI! Local AI—running on your PC—can search the web, summarize the news (and Wikipedia articles), and perform similar tasks without a human ever visiting the site.
It’s like having your own personal secretary that you can tell to go do stuff.
I think it’s going to kill free search engines because it can go do a search on all of them at once in seconds and no human will ever see those ads.
- Comment on Trump’s pick to lead Bureau of Labor Statistics ran Twitter account with sexually degrading, bigoted attacks 1 month ago:
Do what they’re saying is that this person was clearly qualified to work for this administration.
- Comment on Amazon's strict RTO policy is costing it top tech talent, according to internal document and insiders 1 month ago:
We continue to believe that teams produce the best results when they’re collaborating and inventing in person,
First of all, that’s 100% bullshit.
Secondly, what about individuals‽ Ya know, the people that make up “the team.”
Studies have shown that individuals produce the best results when they’re working alone and not bothered regularly by office bullshit.
…but let’s get more specific, because Amazon is talking about innovation and “inventing”: Study after study has shown that the kind of “group brainstorming” that Amazon is referring to here produces worse results that having individuals work on ideas alone then pooling them together afterwards.
Literally the opposite of what they’re claiming.
Amazon: Believe your own bullshit at your peril. Well, at the loss of tech talent I guess 🤷
- Comment on That's an impressive drop. Any ideas why? 2 months ago:
I’m guessing this graph matches closely with anime viewing… The true amplifier of Japan’s population decline!
To solve this crisis, we must make catgirls real and unleash an army of bland
protagonistsyoung men with almost no personality that possess some overpowered skill. Such as the ability to stay thin despite the ready availability of sugary/processed foods. - Comment on AI was a common theme at Gamescom 2025, and while some indie teams say it's invaluable, it remains an ethical nightmare 2 months ago:
Training an AI is orthogonal to copyright since the process of training doesn’t involve distribution.
You can train an AI with whatever TF you want without anyone’s consent. That’s perfectly legal fair use. It’s no different than if you copy a song from your PC to your phone.
Copyright really only comes into play when someone uses an AI to distribute a derivative of someone’s copyrighted work. Even then, it’s really the end user that is even capable of doing such a thing by uploading the output of the AI somewhere.
- Comment on Who is the enemy? 2 months ago:
Ugh, if only that worked for longer than like a month.
Eventually all these materials you can throw under throw rugs to make them stickier end up failing. Catastrophically.
Make sure to get a throw rug that has the non-slip feature sewn in. Make sure it’s nice and heavy too and never put it in the dryer (it’ll ruin the non-slip part). You should probably air dry throw rugs anyway, actually 🤷
- Comment on Who is the enemy? 2 months ago:
The rug threw you. That’s why they’re called that!
- Comment on Who is the enemy? 2 months ago:
Xerox is a bad copy of themselves from decades prior.
- Comment on Who is the enemy? 2 months ago:
Try this and the result may shock you!
Doctors hate it!
- Comment on Who is the enemy? 2 months ago:
…and people that work with resin and 3D printerers.
- Comment on Who is the enemy? 2 months ago:
I’m going to assume the standard was poorly understood because I can’t imagine a multi-billion dollar company hires idiots to set standards.
Ahahahahahahaha! Oh man, you got a good laugh out of me this morning 🤣
- Comment on Uhm 2 months ago:
For images, it’s not even data collection because all the images that are used for these AI image generation tools are out on the internet for free for anyone to download right now. That’s how they’re obtained: A huge database of (highly categorized) image URLs (e.g. ImageNET) is crawled/downloaded.
That’s not even remotely the same thing as “data collection”. That’s when a company vacuums everything they can from your private shit. Not that photo of an interesting building you uploaded to flickr over a decade ago.
- Comment on Uhm 2 months ago:
This is sad, actually, because this very technology is absolutely fantastic at identifying things in images. That’s how image generation works behind the scenes!
esp32-cam identifying a cat, a bike, and a car in an image
ChatGPT screwed this up so badly because it’s programmed to generate images instead of using reference images and then identifying the relevant parts. Which is something a tiny little microcontroller board can do.
If they just paid to license a data set of medical images… Oh wait! They already did that!
Sigh
- Comment on The air is hissing out of the overinflated AI balloon 2 months ago:
LLMs are great at checking grammar in writing. That’s the other thing I’ve found there useful for 🤷
Basically, using LLMs to write something is always a bad idea (unless you’re responding to bullshit with more bullshit e.g. work emails 🤣). Using them to check writing is pretty useful though.
- Comment on Caption this. 2 months ago:
Frogamagogery
- Comment on It Took Many Years And Billions Of Dollars, But Microsoft Finally Invented A Calculator That Is Wrong Sometimes 2 months ago:
I hate Microsoft and Excel but that date thing is exactly the kind of stuff that AI would be great at.
Just not the kind of AI Microsoft probably plans to put in Excel 🤷
- Comment on Japan's 1st osmotic power plant begins operating in Fukuoka 2 months ago:
Well it’s certainly not sad news!
- Comment on USB-C extensions are not allowed for a reason 2 months ago:
3rd party, universal cable != Circuit boards/connectors designed for very specific hardware (used internally)
- Comment on [deleted] 2 months ago:
No cap!