Also, one can create a personal instance of lemmy without users, create a bot to subscribe to many communities and they’d end up with a whole database to simply create personalized recommenders targeted to every single user.
Comment on Hand crafted bot accounts and community targeted ads, what's the story?
jet@hackertalks.com 1 year ago
I can’t speak specifically to the infosec bots, but I suspect it has something to do with all of the Lemmy instances mirroring every post. It could add a lot of weight to SEO for a various websites. So if they can get a post that doesn’t get deleted, that’s SEO fodder
Zeth0s@lemmy.world 1 year ago
bulwark@infosec.pub 1 year ago
The SEO-angle is interesting, thank you for the insight!
Deebster@lemmyrs.org 1 year ago
Seems like Lemmy should add a
rel=canonical
link when browsing federated communities - this would solve this issue (and would be the correct thing to do anyway).jonne@infosec.pub 1 year ago
I believe Lemmy instances disallow crawling by default, so SEO is probably not why. Would be nice to find Lemmy results in Google if they can sort out the canonical URL problem. Reddit was a great resource for random questions, and if people move here it should still be easy to find.
ptz@dubvee.org 1 year ago
Nope, it’s allowed.
The default robots.txt disallows access to a few paths but not /post or /comment.
There are lots of crawler bots hitting my instance (ByteSpider being the most aggressive). I just have a list of User Agent regexes I use to block them via Nginx. Some, like Semrush, have IP ranges I can block completely at the firewall (in addition to the UA filters)
Deebster@lemmyrs.org 1 year ago
What makes you say that? robot.txt just disallows things like /create_community and there's no robots, googlebot, etc meta tags in the source that I can see, and no nofollow apart from on a few things like feeds.
Also, I'm sure I've seen Lemmy appearing in search results already.