Comment

plantteacher@mander.xyz ⁨3⁩ ⁨months⁩ ago

I’ve not tried the onion instance since reporting the data loss issue, but in principle a the onion host could be a good candidate for read-only access.

Would it perhaps make sense to redirect the greedy subnet to the onion instance? I wonder if it’s even possible. The privacyinternational website used to auto-detect requests from Tor exit nodes and automatically redirect to their onion site. They are obviously not using Tor to visit your site, but they could have Tor installed. You would effectively be sending the msg “hey, plz do your scraping on the onion node.” That is assuming your problem is not scraping generally but just that they are hogging bandwidth that competes with most users. The Tor network has some built-in anti-DDoS logic now, supposedly, so they would naturally get bottlenecked IIUC.

I guess the next question is whether the onion site has a separate allocation of bandwidth. But even if it doesn’t, Tor has a natural bottleneck b/c traffic can only move as fast as the slowest of the 3 hops the circuit goes through.

source

Sort:hotnew top

Sal@mander.xyz ⁨3⁩ ⁨months⁩ ago
I have experienced issues both over tor and over clearnet. The tor front-end exists on its own server, but it connects to the mander server. So, the server that hosts the front-end via Tor will see the exit node connecting to it, and then the mander server gets the requests via that Tor server. Ultimately some bandwidth is used for both servers because the data travels from mander, to the tor front-end, and then to the exit node. There is also another server that hosts and serves the images.

What I see is not a bandwidth problem, though. It seems like the database queries are the bottleneck. There is a limited number of connections to the database, and some of the queries are complex and use a lot of CPU. It is the intense searching through the database what appears to throttle the website.

source
- plantteacher@mander.xyz ⁨3⁩ ⁨months⁩ ago
  
  So, the server that hosts the front-end via Tor will see the exit node connecting to it
  
  The onion eliminates the use of exit nodes. But I know what you mean.
  
  I appreciate the explanation. It sounds like replicating the backend and DB on the Tor node would help. Not sure how complex it would be to have the DBs synchronise during idle moments.
  
  Perhaps a bit radical, but I wonder if it would be interesting to do a nightly DB export to JSON or CSV files. Scrapers would prefer that, and it would be less intrusive on the website. Though I don’t know how tricky it would be to exclude non-public data from the dataset.
  
  source