Thank you for sharing
Comment on Microsoft and Reddit Are Fighting About Why Bing’s Crawler Is Blocked on Reddit
DaGeek247@fedia.io 4 months agoMy robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.
I've only gotten like, 20 visits in the past three months though, so, very small sample size.
MrSoup@lemmy.zip 4 months ago
thingsiplay@beehaw.org 4 months ago
Interesting way of testing this. Another would be to search the search machines with adding
-site:your.domain
to show results from your site only. Not an exhaustive check, but another tool to test this behavior.
mozz@mbin.grits.dev 4 months ago
This is fuckin GENIUS
Moonrise2473@feddit.it 4 months ago
only if you don’t want any visits except from yourself, because this removes your site from any search engine
should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)
Miaou@jlai.lu 4 months ago
That’s exactly what was described…?
Moonrise2473@feddit.it 4 months ago
Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt
mozz@mbin.grits.dev 4 months ago
You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.