for common people they respect and even warn a webmaster if they submit a sitemap that has paths included in robots.txt
Comment on Microsoft and Reddit Are Fighting About Why Bing’s Crawler Is Blocked on Reddit
MrSoup@lemmy.zip 2 months agoI doubt Google respects any robots.txt
Moonrise2473@feddit.it 2 months ago
DaGeek247@fedia.io 2 months ago
My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.
I've only gotten like, 20 visits in the past three months though, so, very small sample size.
mozz@mbin.grits.dev 2 months ago
This is fuckin GENIUS
Moonrise2473@feddit.it 2 months ago
only if you don’t want any visits except from yourself, because this removes your site from any search engine
should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)
Miaou@jlai.lu 2 months ago
That’s exactly what was described…?
mozz@mbin.grits.dev 2 months ago
You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.
MrSoup@lemmy.zip 2 months ago
Thank you for sharing
thingsiplay@beehaw.org 2 months ago
Interesting way of testing this. Another would be to search the search machines with adding
-site:your.domain
to show results from your site only. Not an exhaustive check, but another tool to test this behavior.