Comment

Comment on Microsoft and Reddit Are Fighting About Why Bing’s Crawler Is Blocked on Reddit

My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

I've only gotten like, 20 visits in the past three months though, so, very small sample size.

source

Sort:hotnew top

mozz@mbin.grits.dev ⁨1⁩ ⁨year⁩ ago

I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

This is fuckin GENIUS

source
- Moonrise2473@feddit.it ⁨1⁩ ⁨year⁩ ago
  only if you don’t want any visits except from yourself, because this removes your site from any search engine
  
  should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)
  
  source
  - Miaou@jlai.lu ⁨1⁩ ⁨year⁩ ago
    That’s exactly what was described…?
    
    source
    Moonrise2473@feddit.it ⁨1⁩ ⁨year⁩ ago
    Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt
    
    source
    -> View More Comments
  - mozz@mbin.grits.dev ⁨1⁩ ⁨year⁩ ago
    You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.
    
    source
MrSoup@lemmy.zip ⁨1⁩ ⁨year⁩ ago
Thank you for sharing

source
thingsiplay@beehaw.org ⁨1⁩ ⁨year⁩ ago
Interesting way of testing this. Another would be to search the search machines with adding -site:your.domain to show results from your site only. Not an exhaustive check, but another tool to test this behavior.

source