In April 2018, the social media giant announced a new initiative to share a database of URLs its users have linked to between January 1, 2017 to February 19, 2019. To that effect, the company had partnered with an independent research commission, known as Social Science One (SSO), to evaluate the dataset and ensure the shared treasure trove of information adhered to privacy standards. The project was the first-ever attempt to offer an insight into the disinformation operations on the social network. Then last month, Facebook published a dataset of 32 million URLs that were shared publicly more than 100 times on its service between the said dates. But concerns soon started cropping up after it emerged that what Facebook had provided wasn’t what it originally promised due to its ongoing struggles with releasing the information while also protecting its users’ privacy. As a result, seven non-profit groups that helped finance the research efforts, including the Knight Foundation and the Charles Koch Foundation, have threatened to end their involvement with the initiative, The New York Times reports. The development is a setback for a company that’s been actively trying move past a string of privacy missteps and scandals over disinformation and fake news.
Differential privacy delays data sharing
One of the uses of the Facebook dataset would’ve been to understand how news, misinformation or not, is spread across different social media platforms. This would mean, for example, stating “males aged 17 to 25 living in New York City shared a given URL 5,000 times” without disclosing any sensitive personal information associated with of any of the individuals. To fix the privacy challenge, Facebook earlier this April outlined an approach based on differential privacy (DP), a statistical technique that makes it possible for companies to collect and share aggregate information about users, while learning nothing about a specific person. The method, originally developed by computer scientist Cynthia Dwork in 2006, has since found takers in the Silicon Valley, with Apple, Google, and Uber leveraging it to anonymize user data and still glean meaningful results. But as it appears, achieving this goal is also taking its own time, what with “testers discovering there were still ways to de-anonymize the data, or that it lost some of its accuracy.” “We and Facebook have learned how difficult it is to make a database that was not just privacy-protected but at a grand scale,” SSO’s co-founder Nate Persily was quoted as saying by the New York Times. For what it’s worth, political propaganda campaigns have shown no signs of stopping since first uncovered during 2016 US elections. Last week, a study by the Oxford Internet Institute found that organized social media manipulation has more than doubled to at least 70 countries in the last two years. More worryingly, Facebook remained the most popular choice for disinformation campaigns in 56 countries. But Facebook and SSO have vowed to press ahead and expand their efforts to identify valuable privacy protective datasets related to elections and democracy. “These data offer researchers an opportunity to learn a great deal about communication on social media, and how information travels on Facebook — another step in our long path toward providing industry data for legitimate academic research,” the non-profit said.