Comment

Comment on Microsoft and Reddit Are Fighting About Why Bing’s Crawler Is Blocked on Reddit

<- View Parent

MrSoup@lemmy.zip ⁨1⁩ ⁨year⁩ ago

I doubt Google respects any robots.txt

source

Sort:hotnew top

DaGeek247@fedia.io ⁨1⁩ ⁨year⁩ ago
My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

I've only gotten like, 20 visits in the past three months though, so, very small sample size.

source
- mozz@mbin.grits.dev ⁨1⁩ ⁨year⁩ ago
  
  I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.
  
  This is fuckin GENIUS
  
  source
  - Moonrise2473@feddit.it ⁨1⁩ ⁨year⁩ ago
    only if you don’t want any visits except from yourself, because this removes your site from any search engine
    
    should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)
    
    source
    Miaou@jlai.lu ⁨1⁩ ⁨year⁩ ago
    That’s exactly what was described…?
    
    source
    -> View More Comments
    mozz@mbin.grits.dev ⁨1⁩ ⁨year⁩ ago
    You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.
    
    source
- MrSoup@lemmy.zip ⁨1⁩ ⁨year⁩ ago
  Thank you for sharing
  
  source
- thingsiplay@beehaw.org ⁨1⁩ ⁨year⁩ ago
  Interesting way of testing this. Another would be to search the search machines with adding -site:your.domain to show results from your site only. Not an exhaustive check, but another tool to test this behavior.
  
  source
Moonrise2473@feddit.it ⁨1⁩ ⁨year⁩ ago
for common people they respect and even warn a webmaster if they submit a sitemap that has paths included in robots.txt

source