Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

The Open-Source Software Saving the Internet From AI Bot Scrapers

⁨134⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨day⁩ ago⁩ by ⁨sabreW4K3@lazysoci.al⁩ to ⁨technology@beehaw.org⁩

https://www.404media.co/the-open-source-software-saving-the-internet-from-ai-bot-scrapers/

source

Comments

Sort:hotnewtop
  • theangriestbird@beehaw.org ⁨1⁩ ⁨day⁩ ago

    This snip at the end is so good:

    Iaso said she thinks AI companies follow her work, and that if they really want to stop her and Anubis they just need to distract her.

    “If you are working at an AI company, here’s how you can sabotage Anubis development as easily and quickly as possible,” she wrote on her site. “So first is quit your job, second is work for Square Enix, and third is make absolute banger stuff for Final Fantasy XIV. That’s how you can sabotage this the best.”

    source
    • Geodad@beehaw.org ⁨1⁩ ⁨day⁩ ago

      I’d be fine with this… 🤣

      source
  • who@feddit.org ⁨1⁩ ⁨day⁩ ago

    She told me she’s […] also thinking about a version that doesn’t require JavaScript, which some privacy-minded disable in their browsers.

    As someone who is keenly aware of the privacy and security problems that come with allowing web scripts, I hope she prioritizes this soon. It’s really disappointing to find sites that were readable without javascript suddenly inaccessible since adopting Anubis. The more sites that do this, the more people are pushed toward enabling scripts by default, exposing them to a great many trackers and web exploits that would otherwise be blocked.

    source
    • exu@feditown.com ⁨7⁩ ⁨hours⁩ ago

      There’s an option using some very new HTML tag, but it’s not the default.

      anubis.techaro.lol/docs/admin/…/metarefresh

      source
      • who@feddit.org ⁨3⁩ ⁨hours⁩ ago

        Interesting. Judging by that option’s name, it seems to refer to use of the HTML <meta> tag to refresh a page.

        developer.mozilla.org/en-US/docs/…/http-equiv

        Neither this tag nor using it for refresh is new at all. I don’t think I’ve seen it used to detect bots, though. I wonder what Anubis is doing here.

        source
  • leaky_shower_thought@feddit.nl ⁨1⁩ ⁨day⁩ ago

    i like this one better than cloudflare’s turnstile.

    cf blocks me all the time for the smallest reasons and i can’t seem to find their nag email.

    source
    • fuckwit_mcbumcrumble@lemmy.dbzer0.com ⁨1⁩ ⁨day⁩ ago

      I have no issues with Cloudflare, but Anubis always takes it sweet ass time to verify me. Like 30+ seconds just sitting there, but then eventually I get in.

      source
      • Vanilla_PuddinFudge@infosec.pub ⁨3⁩ ⁨hours⁩ ago

        Windows XP ended support like 20 years ago if you were wondering.

        source
  • FundMECFSResearch@lemmy.blahaj.zone ⁨1⁩ ⁨day⁩ ago

    This thing Anubis always flags me for some reason. I use mullvad and safari (ios) with some add and tracker blocking extensions.

    source
    • Appoxo@lemmy.dbzer0.com ⁨17⁩ ⁨hours⁩ ago

      I wonder why traffic from known VPN companies are under more scrutiny than traffic from domestic households…

      source
    • Photuris@lemmy.ml ⁨1⁩ ⁨day⁩ ago

      More sites in general are blocking mullvad traffic lately (in my experience), and I’m not sure what, if anything, can be done about it.

      source
      • FundMECFSResearch@lemmy.blahaj.zone ⁨1⁩ ⁨day⁩ ago

        I expect better from a popular FOSS tool being used by privacy aware people though.

        source
        • -> View More Comments
      • Powderhorn@beehaw.org ⁨1⁩ ⁨day⁩ ago

        Agreed. Luckily, they don’t seem to have the full list of Mullvad IPs, so if I really want to read something, I just try another tunnel.

        source
    • simple@piefed.social ⁨1⁩ ⁨day⁩ ago

      Do you have javascript or cookies disabled? That might stop you from getting past.

      source
      • FundMECFSResearch@lemmy.blahaj.zone ⁨1⁩ ⁨day⁩ ago

        nope

        source
  • remington@beehaw.org ⁨1⁩ ⁨day⁩ ago

    Would you edit your post and add the following archive link to the body, please?

    archive.is/VcoE1

    source
    • who@feddit.org ⁨1⁩ ⁨day⁩ ago

      Unfortunately, archive.is has moved behind Cloudflare, subjecting readers to having their reading habits (both the articles and the referring communities) tracked at a large scale.

      I suggest this archive link instead:

      web.archive.org/…/the-open-source-software-saving…

      source
      • remington@beehaw.org ⁨1⁩ ⁨day⁩ ago

        Unfortunately, archive.is has moved behind Cloudflare, subjecting readers to having their reading habits (both the articles and the referring communities) tracked at a large scale.

        How do you know this?

        What about ghostarchive.org?

        source
        • -> View More Comments
    • sabreW4K3@lazysoci.al ⁨1⁩ ⁨day⁩ ago

      To be honest with you, I refuse on moral grounds. 404 are independent and do good work. You’ve already linked a pay wall bypass in the comments, if anyone would like to find it, it’s not hard to scroll.

      source
      • remington@beehaw.org ⁨1⁩ ⁨day⁩ ago

        OK. Fair enough.

        source