Study Perfect Strangers Can Identify You Using Only 3 Songs From Your Playlist

The gist: When companies such as Spotify use your data to train its AI, they remove all the identifying markers such as your name, account number, or anything else a computer or person could use to immediately identify you. What’s left is the raw data on your listening preferences such as how many times you’ve listened to a track and whether you’ve given a track a thumbs up or not. The big idea here is that Spotify and Pandora don’t need to know who you are in order to serve you the music you like. And this matters because privacy is a huge issue. We have to take companies at their word when they claim they won’t sell our private information or exploit our identities for third-party companies. The research: Data doesn’t change when you remove labels, it still holds the same information even if you don’t know who generated it. And that means, given enough data, a powerful enough system can usually trace it back to the person responsible. This is a scary prospect for people who care about their privacy, but typically there isn’t much to worry about. Companies such as Spotify take great pains to protect their data as it’s usually in their best interest to safeguard their users. However, there’s no way for companies to protect us from good old fashioned human intuition. Per the Tel Aviv team’s paper: The experiments: The researchers didn’t use algorithm-busting AI or digital privacy-smashing techniques to de-anonymize data, they just asked study participants to look at playlists containing only three song selections and decide who, among a group of strangers, each list belonged to. Per a university press release: Quick take: This is astounding. The small study population (N=150) makes it less-than-perfect, but the resulting accuracy is certainly unexpected. What’s most important here is that humans typically aren’t trained to extract high-level features from anonymized data. That means this experiment proved efficacious in a void. A bad actor could apparentley use techniques as simple as combining human intuition with sorting algorithms to de-anonymize data containing more important identifying features than just what songs we’re likely to listen enjoy. As the researchers conclude, “In the digital world we live in today, these findings have far-reaching implications on privacy violations, especially since information about people can be inferred from a completely unexpected source, which is therefore lacking in protection against such violations.” You can check out the full study here.