Why most lemmy instances blocked threads

DenizEfe@lemm.ee · 1 year ago

Why most lemmy instances blocked threads

SavvyWolf · 1 year ago

There are multiple reasons depending on who you ask and the specific instance:

Privacy - Meta has a really bad track record of user privacy. There is the worry that federating with them will result in them scraping user data from users (which IMO is a bit silly - Meta can and probably is scraping all the available public information anyway, defederating doesn’t really fix that).
Moderation - Meta is notoriously bad (compared to Fediverse servers) at moderating their content. Admins and mods don’t want to have to spend a lot of time dealing with trolls coming from Threads.
Terms of service - I’ve not looked into it, but I assume that Threads have a strict acceptable use policy on content on other instances, which presumably they enforce unilaterally. Instance admins might not want to deal with complying with that, so just don’t bother federating at all.
Ideology - “The fediverse should not be influenced by a company!” Not all instances are like this, but there are some that see the fediverse as a statement against capitalist greed and megacorps.

givesomefucks@lemmy.world · 1 year ago

which IMO is a bit silly - Meta can and probably is scraping all the available public information anyway, defederating doesn’t really fix that

If they’re federated everything gets sent to them automatically.

If they’re not, they only get the info users see, and it’s a hassle to compile index and store. Like they could keep a running index of every user page, but why would they?

SavvyWolf · 1 year ago

The only information that actually gets federated to other servers is public information that is globally visible anyway. Fediverse servers don’t (or at least SHOULDN’T) trust each other.

It’s not actually that hard to index and store the information, especially if you just want textual post data - Mastodon at least can serve you an easy to parse version of a user’s posts if you request it. Sure you need to poll for the information rather than it just being sent to you, but I think if they were motivated enough they could do it.

Sunforged@lemmy.ml · 1 year ago

Easy to spin up a scrapper server that isn’t threads to collect data.

DenizEfe@lemm.ee · 1 year ago

Thank you for answering my question

tal@lemmy.today · edit-2 1 year ago

Privacy - Meta has a really bad track record of user privacy. There is the worry that federating with them will result in them scraping user data from users (which IMO is a bit silly - Meta can and probably is scraping all the available public information anyway, defederating doesn’t really fix that).

Yeah, I was gonna say…as things stand, the privacy situation on the Threadiverse is in many respects weaker than on, say, Reddit. Yeah, you get to choose the third-party app that may live on a phone, or the Web client, and your instances only directly pushes some data out via federation.

However, if you’re on the Threadiverse, then you have no idea what a given Threadiverse instance out there pulling in federated data is storing. You don’t know how secure your instances is, even if your instance admin has the best intentions. Unless your instance is whitelisting a very limited set of trusted instances or isn’t federating at all and is private, treating anything you put out there as basically accessible to every organization and company is probably a good idea.

Your own instance may not retain deleted (including by mods or admins) or edited comments, but it’s a good bet that if someone else’s instance isn’t yet, they will, and they’ll permit recovering them. There were people doing this on Reddit via pushshift.io.

It’s probably possible to have people analyzing comment activity to detect where someone’s instance is, based on time-of-day and holiday and so forth activity; people had several sites doing this for Reddit.

And it’s probably not that hard to obtain a user’s IP address, so either you want to be okay with what you’re posting maybe being linked to your IP or avoid having a persistent IP, like, via use of a VPN or something. Probably possible for someone to at least roughly geolocate an IP. Might be possible to correlate it with other logs; if someone, for example, has access to someone’s Steam login history and can link that to an identity and can link both to an IP address at different times, they can probably deanonymize a user.

There are also text classifiers that can run on comments, extract things like someone’s likely gender and anything else that you’ve trained a statistical text classifier on a large-enough corpus. Probably can get at least approximate age, and I’ve seen classifiers that aim at identifying roughly where someone lives. Some famous examples of deanonymization via text:

Robert Hanssen, a very serious mole in the FBI, was caught after he used the phrase “the purple-pissing Japanese”, which was a quote from General George Patton, in an anonymous context, and someone had heard him use it once before (not a computer, just humans managed to pull this off). It’s probably possible to cross-correlate unusual phrases across identities; it doesn’t take many to form a unique signature.
The Federalist Papers were an important set of documents written under the pseudonym “Publius” by several major Founding Fathers in the US – Alexander Hamilton, James Madison, and John Jay. They argued for the ratification of the US Constitution. Some centuries later, computer-based Bayesian statistical analysis became practical, and it became possible to deanonymize most of the articles – train a classifier on their known works, then run it on their anonymous works, and get an estimate with confidence level as to the identity of the author. That was pretty nifty from a historian’s standpoint, but it’s worth considering that the same technique is also viable today to deanonymize people.

With Reddit or similar, Reddit’s probably gonna data-mine what they can and may sell it to some parties, but they also probably won’t be directly feeding it to random unsavory person, though it may wind up in their hands.

There are probably a couple of good ways that lemmy/kbin could legitimately improve privacy.

I don’t know what the logging situation is today, but having the option for an admin to bound log retention time might be a good idea; retaining enough for abuse and debugging, but not leaving a lot of data around in case someone breaks in and swipes 'em. You still need to trust your instance admin, and the lemmy/kbin software, but at least it’s possible for an admin to bound what gets swiped if someone breaks in.
Not allowing remote images in comments, which is presently permitted; as I point out above, that’s going to let user IP addresses be extracted by parties other than their instance. At least give the user the option to block them, and have home instances maybe have an option to cache them and serve them locally…that’ll create its own storage and bandwidth concerns, but one can at least imagine heuristics to deal with that.
Having some form of public/private key authentication – like, I can upload a pubkey to an account – to permit someone to prove that they are who they say they are in the event of later instance compromise.