How the Great Firewall of China Detects and Blocks Fully Encrypted Traffic

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨tardigrada@beehaw.org⁩ to ⁨technology@beehaw.org⁩

https://gfw.report/publications/usenixsecurity23/en/

One of the cornerstones in censorship circumvention is fully encrypted protocols, which encrypt every byte of the payload in an attempt to “look like nothing”. In early November 2021, the Great Firewall of China (GFW) deployed a new censorship technique that passively detects—and subsequently blocks—fully encrypted traffic in real time. The GFW’s new censorship capability affects a large set of popular censorship circumvention protocols, including but not limited to Shadowsocks, VMess, and Obfs4. Although China had long actively probed such protocols, this was the first report of purely passive detection, leading the anti-censorship community to ask how detection was possible.

The paper discloses findings and suggestions to the developers of different anti-censorship tools, helping millions of users successfully evade this new form of blocking.

source

Comments

Sort:hotnew top

tal@lemmy.today ⁨1⁩ ⁨year⁩ ago
That can be used as a heuristic, and that may be good-enough to disrupt widespread use of VPN protocols.

But it’s going to be hard to create an ironclad mechanism against steganographic methods, because any protocol that contains random data or data that can’t be externally validated can be used as a VPN tunnel.

I can create “VPN over FTP”, where I have a piece of software that takes in a binary stream and generates a comma-separated-value file that looks something like this:

employee,id,position John Smith,54891,Recruiter Anne Johnson,93712,Receptionist

etc.

Then at the other end, I convert back.

So I have an FTP connection that’s transmitting a file that looks like this.

That’s human-readable, but the problem is that it’s hard to identify that maybe all of those fields are actually encoding data which might well be an encrypted VPN connection.

You can do traffic analysis, look for bursty traffic, but the problem is that as long as the VPN user is willing to blow bandwidth on it, that’s easy to counter by just filling in the gaps with padding data.

You can maybe detect one format, but I’d wager that it’s not that hard to (a) produce these manually with a lot less effort than it is to detect new ones, and (b) probably to automatically train one that can “learn” to generate similar-looking data by just being fed a bunch of files to emulate.

A censor can definitely raise the bar to do a VPN. They don’t need a 100% solution. And they can augment automated, firewall blocks with severe legal penalties aimed at people who go out of their way to bypass blocks.

But on the flip side, steganography is going to be probably impossible to fully counter if one intends to blacklist rather than whitelist traffic.
source
- jarfil@beehaw.org ⁨1⁩ ⁨year⁩ ago
  
  automatically train one that can “learn” to generate similar-looking data by just being fed a bunch of files to emulate
  
  Sounds like a job for a “compression prompt” for ChatGPT… [and thus, the AI wars began]
  
  source
scrubbles@poptalk.scrubbles.tech ⁨1⁩ ⁨year⁩ ago
This really sucks, but we do know it’s a cat and mouse game. AI/coded, doesn’t matter, it’s pattern recognition. It’s only a matter of time until someone figured out how to change the pattern in a way that isn’t detected.

source