krellor

@krellor@beehaw.org

This is a remote user, information on this page may be incomplete. View at Source ↗

⁨Comment⁩ on ⁨OpenAI says it’s “impossible” to create useful AI models without copyrighted material⁩ ⁨⁨2⁩ ⁨years⁩ ago⁩:
I thought about the indexing situation in contrast to the user paywall. Without thinking too much about any legal argument, it would seem that NYT having a paywall for visitors is them enforcing their right to the content signaling that it isn’t free for all use, while them allowing search indexers access is allowing the content to visible but not free on the market.

It reminds me of the Canadian claim that Google should pay Canadian publishers for the right to index, which I tend to disagree with. I don’t think Google or Bing should owe NYT money for indexing, but I don’t think allowing indexing confers the right for commercial use beyond indexing. I highly suspect OpenAI spoofed search indexers while crawling content specifically to bypass paywall and the like.

I think part of what the courts will have to weigh for the fair use arguments is the extent to which NYT it’s harmed by the use, the extent to which the content is transformed, and the public interest between the two.

I find it interesting that OpenAI or Microsoft already pay AP for use of their content because it is used to ensure accurate answers are given to users. I struggle to see how the situation is different with NYT in OpenAI opinion, other than perhaps on price.

It will be interesting to see what shakes out in the courts. I’m also interested in the proposed EU rules which recognize fair use for research and education, but less so for commercial use.

Thanks for the reply! Have a great day!
⁨Comment⁩ on ⁨OpenAI says it’s “impossible” to create useful AI models without copyrighted material⁩ ⁨⁨2⁩ ⁨years⁩ ago⁩:
The issue is that fair use is more nuanced than people think, but that the barrier to claiming fair use is higher when you are engaged in commercial activities. I’d more readily accept the fair use arguments from research institutions, companies that train and release their model weights (llama), or some other activity with a clear tie to the public benefit.

OpenAI isn’t doing this work for the public benefit, regardless of the language of altruism they wrap it in. They, and Microsoft, and hoovering up others data to build a for profit product and make money. That’s really what it boils down to for me. And I’m fine with them making money. But pay the people whose days your using.

Now, in the US there is no case law on this yet and it will take years to settle. But personally, philosophically, I don’t see how Microsoft taking NYT articles and turning them into a paid product is any different than Microsoft taking an open source projects that doesn’t allow commercial use and sneaking it into a project.
⁨Comment⁩ on ⁨Starbucks accused of manipulating app payments for $900 million profit⁩ ⁨⁨2⁩ ⁨years⁩ ago⁩:
I’ve never had a Starbucks gift card or used the app, but in the article they say that in store you can do a split payment using up either gift card or app balance, and pay the remainder cash. Is that something you’ve tried?
⁨Comment⁩ on ⁨How bad are search results? Let's compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT⁩ ⁨⁨2⁩ ⁨years⁩ ago⁩:
I do agree that the content farms are frustrating, and if like to see more done to combat them. I also agree that discussions happening on locked platforms is net loss for the sharing of information. I think search engines can do more about the former, but not as much about the latter. I think folks like us having discussions on the open can help.

I went ahead and searched “best Linux distro” and the top three results for me was
I then turned on my phones VPN, opened edge, went to Google, and repeated the search with the same top three results. I tried to bypass personalization, but might need to use a clean VM with VPN to succeed.

I actually thought all of those results were pretty good from a quick skim.

I will say I have custom DNS filters and plugins that block ads and untrustworthy domains and I can’t guarantee that didn’t influence my results.

I tried other searches like “best Linux distro” plus “programming” or “gaming” and received similarly helpful results. But I can’t tell if I’m in a personalization bubble.
⁨Comment⁩ on ⁨How bad are search results? Let's compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT⁩ ⁨⁨2⁩ ⁨years⁩ ago⁩:
I’ll play devil’s advocate.

The author is basically complaining that search results aren’t tailored to their own search habits, and for all we know they are using tools to prevent Google data collection for personalized search.

Using the search term “YouTube downloader” and having the success criteria being the return of a fork of a command line Python tool is an insane test for the general public. How many of your family members who are looking to download a YouTube video would be helped by that result?

I searched “YouTube downloader” and received the usual ad-ridden websites that let you download a video. Then I searched “YouTube downloader Linux” and the top result was ytdl-org on GitHub. Seems reasonable.

I’ve seen many people complain about Google search lately. I wonder how many of them either have unrealistic expectations, never learned to use scoping keywords, or who stopped search personalization and lost benefits they didn’t know they were getting. And expecting a fork of a command line tool to be the top result for YouTube downloader is definitely unrealistic.

Anecdotally, I’ve used more or less the same search strategy for 30 years, and it still brings up relevant results. And while I agree that seo gamification can make certain keywords harder than others to use, this article and test really wasn’t testing search scenarios the average non-technical user of these search engines would have.