I’m certain that if someone did collect data from the Fediverse; it would become a hot topic
I’d assume bad actors (or at least chaotic neutral actors) are slurping up the entire fediverse already. It is trivial to do, and nobody would know.
I mean, the whole point is that anyone can spin up a server and federate with others. I could start my own server, which would by default federate with almost all other servers. That means I wouldn’t even need to write a scraper. All that data would be sent straight to my server. All I need is access to my own database at that point. With Lemmy, I’d even get users’ upvote/downvote history, which is not visible in any clients AFAIK. The only barrier would be to subscribe to communities on different servers to kickstart federation.
As long as you don’t run obvious spam/bot accounts, nobody would block your instance.
Alternatively, if you want to write a scraper, that’s also pretty easy. Most servers are publicly accessible. Every community has an RSS feed. You don’t even need an account in general. Again, the whole point is to be open and accessible, in contrast to closed-off data-misers like Facebook, Reddit, and X.
The fediverse is friendly to users, with very little regard for what those users might do. I believe this is the correct philosophy, but I won’t pretend that it doesn’t leave us open to bad behavior.
Danterious@lemmy.dbzer0.com 2 months ago
Why do you think that? I don’t think that there is anything systemic in how the fediverse operates that will stop LLMs polluting the discourse here too. Actually I already think that they are polluting the discourse here.
~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~
Melody@lemmy.one 2 months ago
The filtration capabilities available to most users is pretty robust; depending on what you use to interact with the Fediverse. I thinik it would be possible to filter out problematic bots, users and even whole domain sources with the right kind of software.
Danterious@lemmy.dbzer0.com 2 months ago
Good point. I have been a lot more active in tailoring my experience here compared to other social media. I wish there was more tools for deciding whether or not you want to block someone though. Sometimes its not as simple as just looking at their post history. Also as an aside I wish it was possible to block votes as well so the ranking of the content was also able to be personalized.
~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~
Melody@lemmy.one 2 months ago
Such a system might be constructed for one’s own scraping needs by taking any one of the current frontend/backends and customizing that behavior such that it could mitigate issues or ingest/ignore data based on your own inputs as well; such that your model could be “riding along on a human surfboard with human guidance”