On 3 March 2020, AlgorithmWatch launched a project to monitor Instagram’s newsfeed algorithm. Volunteers could install a browser add-on that scraped their Instagram newsfeeds. Data was sent to a database we used to study how Instagram prioritizes pictures and videos in a user’s timeline.
Over the last 14 months, about 1,500 volunteers installed the add-on. With their data, we were able to show that Instagram likely encouraged content creators to post pictures that fit specific representations of their body, and that politicians were likely to reach a larger audience if they abstained from using text in their publications (Facebook denied both claims). Although we could not conduct a precise audit of Instagram’s algorithm, this research is among the most advanced studies ever conducted on the platform. The project was supported by the European Data Journalism Network and by the Dutch foundation SIDN. It was done in partnership with Mediapart in France, NOS, Groene Amsterdammer and Pointer in the Netherlands, Süddeutsche Zeitung in Germany and was covered by dozens of news outlets over the world.
We asked Facebook for comments on our findings prior to publication. The company did not answer our questions but told us on 28 May 2020 that our “research [was] flawed in a number of ways”. On 2 March 2021, they claimed that they "found a number of issues with [our] methodology". They did not list the flaws and issues in question, but we assumed that they carried out a thorough review of our methods and tools prior to issuing such strong statements.
Weaponizing the Terms of Service
This is why we were very surprised when, in early May 2021, Facebook asked us for a meeting. Our project, said Facebook, breached their Terms of Service, which prohibit the automated collection of data. They would have to “mov[e] to more formal engagement” if we did not “resolve” the issue on their terms – a thinly veiled threat.
Section 3.2.3 of their Terms indeed states that one “may not access or collect data from [Facebook’s products] using automated means”. However, we only collected data related to content that Facebook displayed to the volunteers who installed the add-on. In other words, users of the plug-in where only accessing their own feed, and sharing it with us for research purposes.
Facebook also claimed that our system violated the GDPR because some of the collected data stemmed from users who never agreed to the project, whose pictures were shown in the timeline of our volunteers. However, a cursory look at the source code, which we open-sourced, show that such data was deleted immediately when arriving at our server.
We designed our add-on with great care so that Facebook cannot identify and prosecute our volunteers. However, Facebook’s reaction shows that any organization that attempts to shed light on one of their algorithms is under constant threat of being sued. Given that Facebook’s Terms of Service can be updated at their discretion (with 30 days’ notice), the company could forbid any ongoing analysis that aims at increasing transparency, simply by changing its Terms.
On 13 July, we took the decision to terminate the project and delete any collected data (media partners still have fully anonymized versions of the data). Ultimately, an organization the size of AlgorithmWatch cannot risk going to court against a company valued at one trillion dollars.
We decided to come public with this story after Facebook shut down the accounts of researchers working on the Ad Observatory at New York University (NYU). They built a browser add-on that collects data about advertisements on the platform. Their set-up is used by many researchers, including some from the Virality Project, which measures misinformation on Covid vaccines.
Facebook claimed that they offered to provide researchers with the data needed, and that the browser add-on compromised user privacy, citing an order by the Federal Trade Commission (FTC), the US consumer protection agency. However, the FTC called this claim “inaccurate” in a strongly-worded statement released just a day after it was revealed that Facebook blocked the Ad Observatory. The FTC said it supports NYU’s project and encourages Facebook to exempt good-faith researchers from monolithic and self-serving interpretations of privacy law.
This is not the first time that Facebook aggressively goes against organizations that try to empower users to be more autonomous in their use of social media. In August 2020, it threatened Friendly, a mobile app that lets users decide on how to sort their newsfeed. In April 2021, it forced several apps that allowed users to access Facebook on their terms out of the Play Store. There are probably more cases of bullying that we do not know about. We hope that by coming forward, more organizations will speak up about their experiences.
Ensuring actual transparency
While Facebook’s new interest in user privacy might surprise some (after all, its founder said that it was not a “social norm” anymore), its claims of wanting to help researchers and civil society organizations could fool lawmakers into taking them at face value.
Researchers cannot rely on data provided by Facebook because the company cannot be trusted. The company failed to act on its own commitments at least four times since the beginning of the year, according to The Markup, a non-profit news organization that runs its own monitoring effort called Citizen Browser. In January for instance, in the wake of the Trumpist insurgency in the US, the company promised that it would stop making recommendations to join political groups. It turned out that, six months later, it still did.
Even Facebook’s Ad Library, one of the company’s flagship transparency projects, suffered “bugs” that harmed its credibility. In December 2019, a few days before the United Kingdom’s general election, almost half of the British advertisements stored in the Library disappeared.
There is no reason to believe that Facebook would provide usable data, were researchers to replace their independently collected data with the company’s. AlgorithmWatch also rebuked Facebook’s claim that data access cannot be granted in a privacy-preserving way in a study published in June 2020. Intermediary institutions could be established with the mandate to enable such data access frameworks for public interest research, as we argued in our response to the European Commission’s Digital Services Act in September 2020. Currently AlgorithmWatch together with some partners is investigating Youtube's recommendation algorithm with the data donation project DataSkop.
Shedding light on Instagram’s algorithms is urgently needed. In early May, Colombian users noticed that the content they posted in relation to ongoing protests in the country tended to disappear. The same happened in Palestine and Israel where findings hint at systematic efforts to remove certain types of Palestinian content. Instagram said the “issue” had been fixed a few days later and that they have never intended to silence protesters. However, subsequent reporting by BuzzFeed News showed that more was at play and that moderation teams could arbitrarily silence communities (in this case, Facebook considered the Al Aqsa mosque to be a terrorist organization). Instagram revealed last week that content about ‘political issues' was de-prioritized in its 'Reels’ feature (a TikTok-like video format), but the definition of what is political is flimsy.
Users have reported other instances of disappearing posts and shadowbans (when user content is not deleted, but not shown to others). Without independent public interest research and rigorous controls from regulators, it is impossible to know whether Instagram’s algorithms favors specific political opinions over others. Previous reporting in the United States showed that Facebook took some product decisions in order to protect alt-right figures.
Large platforms play an oversized, and largely unknown, role in society, from identity-building to voting choices. Only by working towards more transparency can we ensure, as a society, that there is an evidence-based debate on the role and impact of large platforms – which is a necessary step towards holding them accountable. Only if we understand how our public sphere is influenced by their algorithmic choices, can we take measures towards ensuring they do not undermine individuals’ autonomy, freedom, and the collective good.
European lawmakers have the chance, with the Digital Services Act, to ensure that public interest researchers, including academia, journalists, and civil society organizations, have access to the data we need from large platforms. Read our demands on how they could do just that.
Media contact: firstname.lastname@example.org | +49 (0)30 99 40 49 001
Did you like this story?
Every two weeks, our newsletter Automated Society delves into the unreported ways automated systems affect society and the world around you. Subscribe now to receive the next issue in your inbox!