HTTPS Traffic Analysis Project Homepage

The HTTPS Traffic Analysis Project is a research effort of the SCRUB center located at UC Berkeley and funded by Intel, with contributions including traffic analysis attack, defense and evaluation techniques. This page provides code, data and documentation resources in relation to HTTPS traffic analysis.

Documentation

I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis.
Brad Miller, Ling Huang, Anthony D. Joseph and J.D. Tygar.
Proceedings of the Privacy Enhancing Technologies Symposium (PETS), July 2014.

There is also an arXiv report available which provides additional documentation beyond the page limit of the PETS publication.

Code

To facilitate further research, we are making much of the code used in our work publicly available. The code includes implementation of the BoG attack produced in this project, as well as attacks by Panchenko et al., Liberatore & Levine and Wang et al.. The code release code.tar.gz also includes a README to get started with the code.

Note that our code depends on a number of other open tools and libraries, including NumPy, SciPy, scikit-learn, LIBLINEAR, LIBSVM, sofia-ml and py-leveldb.

Data

In addition to code, we are also releasing the data used in our analysis. Due to the large volume of the data we have divided the release into several tarballs linked below. Each link is followed by the compressed and uncompressed size of each tarball. Once all data is downloaded, data_release_check.py can be run on the uncompressed data to verify a series of internal relationships within the data. The script also contains comments explaining each of the relationships and providing introductory documentation.

Note that pcaps.tar.gz includes only the first 96 bytes of each packet since traffic analysis requires only meta-data found in the packet headers. Due to the high computational cost of some techniques we include in our analysis, results.tar.gz includes a number of intermdiary feature files and models generated by the attacks. An abbreviated version is available as results_accuracy_only.tar.gz which includes only the small files stating overall attack accuracy.

  • pcaps.tar.gz (17G, 38G) md5: 19bcbf79dfbbe52a67b95eea9414cde4
  • sitemaps.tar.gz (44K, 1.2M) md5: 76773c0b4894a05df5688ba436609115
  • features.tar.gz (3.5G, 18G) md5: 09bb83802082159d0a355c56734aef15
  • folds.tar.gz (24M, 332M) md5: b080286e6338969192f851a58f8417fb
  • results.tar.gz (29G, 88G) md5: 4b5609eec14e7bab118a1f7c78a781d5
  • results_accuracy_only.tar.gz (56K, 13M) md5: 5564c57bbac35ace51161a573f9c6392
  • Press Coverage

  • Wall Street Journal: Researchers use big data to get around encryption
  • MIT Technology Review: Statistical tricks extract sensitive data from encrypted communications
  • PC World: Even encrypted Web traffic can reveal highly sensitive information
  • Ars Technica: New attack on HTTPS crypto might reveal if you're pregnant or have cancer
  • Threatpost: New attacks on HTTPS traffic reveal plenty about your web surving
  • CSO Online: Researchers attack secured Internet activity to mine personal data