Comment

Thought: There should be a federated system for blocking IP ranges that other server operators within a chain of trust have already identified as belonging to crawlers.

(Here’s an advantage of Markov chain maze generators like Nepenthes: Even when crawlers recognize that they have been served garbage and delete it, one still has obtained highly reliable evidence that the IPs that requested it do, in fact, belong to crawlers.)

source

Sort:hotnew top

Opisek@lemmy.world ⁨2⁩ ⁨months⁩ ago
You might want to take a look at CrowdSec if you don’t already know it.

source
- Novocirab@feddit.org ⁨2⁩ ⁨months⁩ ago
  [deleted]
  source
  - Opisek@lemmy.world ⁨2⁩ ⁨months⁩ ago
    There are many continuously updated IP blacklists on GitHub. Personally I have an automation that sources 10+ of such lists and blocks all IPs than appear on like 3 or more of them. I’m not sure there are any blacklists specific to “AI”, but as far as I know, most of them already included particularly annoying scrapers before the whole GPT craze.
    
    source
- rekabis@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  Holy shit, those prices. Like, I wouldn’t be able to afford any package at even 10% the going rate.
  
  Anything available for the lone operator running a handful of Internet-addressable servers behind a single symmetrical SOHO connection? As in, anything for the other 95% of us that don’t have mountains of cash to burn?
  
  source
  - Opisek@lemmy.world ⁨2⁩ ⁨months⁩ ago
    They do seem to have a free tier of sorts. I don’t use thew personally, I only know of their existence and I’ve been meaning to give them a try. Seeing the pricing just now though, I might not even bother, unless the free tier is worth anything.
    
    source