Comment on Reddit Undeleted all my posts and comments
tal@lemmy.today 2 months agoSo, I’m gonna be honest. I don’t think that mass deanonymization via text analysis is in the immediate future.
Is it a theoretical risk? Yes. It’s not because I don’t think that it’s technically doable. It’s for a rather-more-depressing reason: because there’s lower-hanging fruit if someone is trying to build a deanonymized database. I just don’t think that it’s presently worth the kind of effort required, in general.
Any time you have an account with some company that persists for a long time, if they retain a persistent IP address log, then whenever you log in, you’re linking your identity and the IP address at that time. Especially if one cross-correlates logs at a few companies, and a data-miner could do a reasonably reliable job of deanonymizing someone. Maybe it’s not perfect, maybe there are several people in a household or something, maybe some material is suspect. But if you’re watching cookies in a browser on a phone crossing from one network to another and such, my guess is that you can typically probably map an IP address to a fairly limited number of people.
I mean, there are ways to help obfuscate that, like Tor. But virtually nobody is doing that sort of thing. And even through something like Tor, browsers tend to leak an awful lot of bits of unique information.
And if someone’s downloading an app to their phone that’s intentionally transmitting a unique identifier, then it’s pretty much game over anyway, absent something like XPrivacyLua that can forge information. Companies want to get people using their phone apps.
An individual person might be subject to doxxing from someone who wants to try to identify their real-life persona from an online persona. But I don’t think that companies will generally likely be going that route in the near future to try to deanonymize users en masse, because they’ve already got easier, more-reliable ways to track people that people are vulnerable to.
lvxferre@mander.xyz 2 months ago
While I don’t think that text analysis (TA) is going to replace those techniques that you mentioned, I do think that it is a threat to anonimity in the immediate future, because it’ll likely be used alongside those techniques to improve their accuracy and lower their overall costs.
The key here is machine “learning” lowering the TA fruit by quite a bit. People misattribute ML with almost supernatural abilities, but here it’s right at home, as it’s literally made to find correlations between sets of data. And, well, TA is basically that.
Another reason why I think that it’s a threat is because even a partial result is useful. TA doesn’t just identifies you; it profiles you. And even if not knowing exactly your name and address, info like age, sex, gender, location, social class, academic formation etc. is still useful for advertisers and similar.
(Besides the Federalist Papers and Robert Hanssen, another interesting example would be how the Unabomber was captured. It illustrates better how the analysis almost never relies on a single piece of info, but rather multiple pieces that are then glued together into a coherent profile.)
(Also sorry for nerding out about this, it’s just a topic that I happen to enjoy.)