Tracing the Network Traffic Fingerprinting Techniques of OpenVPN (Comms ACM)
While designing VPN software is not rocket science, designing an effective VPN obfuscation layer is much more difficult.
Below is an excerpt from a Communications of the ACM Technical Perspective, written by Distinguished Professor of Computer Science Gene Tsudik.
It is well known that a fundamental conflict exists between various entities (for example, ISPs, corporations, and governments) that want to control and/or monitor Internet activity and individual users (or groups thereof) who desire privacy and/or censorship circumvention. This conflict manifests itself as a global and seemingly never-ending arms race between privacy technologies (for example, Tor and VPNs) and traffic analysis methods that aim to identify (fingerprint) and block communication conducted using the former.
This is not a universal battle between good and evil. There are numerous examples where upstanding users (for example, citizen journalists, dissidents, whistle-blowers) fight the proverbial “good fight” to circumvent oppression. However, it is just as simple to turn the table and consider settings where malicious actors (for example, trolls, spies, terrorists, criminals) are pitted against “righteous” entities that attempt to prevent their activities. Also, sometimes the motivation for blocking cloaked traffic is not nefarious but is simply driven by the need to maintain acceptable QoS for other traffic.
Security researchers are not supposed to take sides in this battle. It is equally legitimate to explore privacy and anti-privacy techniques as well as to attack either. Both sides in the conflict must be aware of flaws and strengths of the tools they use. To this end, the accompanying paper investigates network traffic fingerprinting (simply fingerprinting hereafter) of a very popular privacy technique—OpenVPN.
Fingerprinting is the art of probabilistically identifying—ideally, with low error rates—traffic patterns that correspond to a particular targeted activity, VPN use in this case. As the paper illustrates, OpenVPN is susceptible to quite accurate fingerprinting via a two-stage process: passive traffic analysis (Filter), followed by active probing (Prober). It reports >85% success rate in identifying OpenVPN connections. Furthermore, even when coupled with an optional obfuscation layer (for example, Chameleon or Stealth), OpenVPN traffic remains detectable. These results were obtained in partnership with Merit, a Michigan-based ISP that serves about one million users. Such a partnership is necessary for any credible study/experiment of real-world traffic analysis since the experimenters must play the combined role of the ISP and the censor
Interestingly, the Deep Packet Inspection (DPI) approach taken in this work is inspired by the infamous Great Firewall of China. Although DPI is well-known and widely used, its features and accuracy grew and improved over the years. The main issue is whether it can be used in real time and at scale. The authors show it can. Moreover, the proposed method outperforms prior ML-based techniques in terms of FPR.
One thorny issue in conducting this type of a study is ethics. Genuine VPN (especially obfuscated) traffic is, by its very nature, sensitive and any information collected about it must be treated with utmost care.
Read the full perspective in Communications of the ACM.