
Data is the new oil

  • Very useful
  • But very dangerous

Issues around data collection

  • Data linking - correlating data sets
  • (De)anonymisation
  • Massive Scale
  • Data breach
  • Fishing expeditions -> Using data out of collected purpose

Tools aren’t built for users

Tools are often built for government agencies to easily retrieve your data.

Who has your data?

Everyone tbh.

  • Facebook
  • Google
  • Apple

What data?

Location history, calls, emails, files, age, etcetera

Who uses your data

  • Private Companies
  • Government
  • Intelligence Communities

Data Lakes

Pooling ‘streams’ of data into one big ‘lake’.

Think twice

  • It’s end to end encrypted
    • Malicious public/private keys might be added
  • Algorithms not people
    • People
  • Only does X under Y conditions
    • Bugs.
  • It’s locked down - only accessible by X
    • Rogue sysadmins (of data centers)
    • ie Snowden
  • Thorough auditing
    • Oh.. do you now…
  • Secure
  • Anonymised

Anonymisation Techniques

  • Redacting
  • Encrypting / Hashing
  • Pseudonyms
  • Binning (generalising the coverage)
  • Statistical noise
  • Aggregation