Even if you think you’re browsing Twitter “anonymously,” machine learning algorithms can still pinpoint you in a crowd of 10,000 other users using metadata associated with your posts, according to a new study.
“Metadata” refers to data about other data. In the context of a Twitter post, this includes the date and time of the post, the number of characters in it, the device it was posted from, its grammatical style, the location it was posted from, and a host of other markers. The average tweet contains about 144 pieces of metadata.
Using machine learning, researchers at University College London and the Turing Institute have developed a method of identifying individual users with 96.7 accuracy using metadata alone. Even if your handle is “LibPwner2016,” the metadata can still reveal who you are. And most of that metadata is accessible through Twitter’s API.
The experiment was run on Twitter, but the researchers say that the same methods can be used to test privacy on other platforms.
“The methods described in this work can be applied to a vast class of platforms and systems that generate metadata with similar characteristics”, conclude the researchers.
This is bad news for Facebook, which has spent much of this year dealing with national scrutiny after repeated scandals involving the loss of sensitive user data to third parties.
The collection of metadata and its implications for individual privacy became a particularly high-profile issue under the Obama administration when whistleblower Edward Snowden revealed that the NSA routinely harvested mass amounts of metadata about Americans’ phone calls.
At the time, James Clapper, then director of the NSA and now a fierce critic of President Trump, lied to Congress about the existence of the spying program. (Clapper later claimed he forgot it existed.)