Our online environment today suffers from enormous information asymmetry: Online platforms assemble information about us, while we know little about them. And while platforms share data with commercial third parties, researchers working in the public interest have had limited data access at best, and at worst, have faced technical barriers and legal threats.
Even if they only provided limited access, Facebook’s CrowdTangle and Twitter’s public API were once, at least, helpful in making “publicly available” platform data accessible to a broad range of researchers, civil society watchdogs, and journalists. But things can change quickly for the worse when platforms are left to self-govern. CrowdTangle has been gutted and Twitter’s API has been heavily restricted, leaving researchers in the dark. The current disordered state of these APIs shows how unreliable access to data has seriously hindered systematic research on platforms and their impacts on society.
EU regulators are now stepping in to help fix this imbalance. The Digital Services Act (DSA) includes extensive transparency obligations for online platforms and search engines, the largest of which will be obligated to provide data access to public interest researchers via Article 40.
Among these obligations are for platforms to develop better systems for sharing publicly accessible data under Article 40, paragraph 12—an obligation which echoes a formal commitment made by platforms in 2022 under the EU’s Code of Practice on Disinformation. Despite their promises and legal obligations, it is clear that platforms are taking steps in the opposite direction, moving to restrict the sharing of publicly available data or “delivering” it on unworkable terms.
Platforms' actions highlight the need to establish clear standards for these public APIs in terms of the data they make accessible, how and when access is provisioned, and to whom. To establish such standards, the European Commission has launched a public consultation inviting feedback from stakeholders, including from researchers and civil society, on key technical and procedural aspects of Article 40.
After reviewing the evidence from these stakeholders, the Commission will lay out its data access guidelines in a “Delegated Act” to specify, among other things, the kinds of data platforms must actually produce, and how data (including public data) should be made accessible to vetted public interest researchers in a privacy-protecting manner (read AlgorithmWatch’s submission to the Call for Evidence).
AlgorithmWatch is part of a group of civil society experts actively discussing the implementation of the DSA’s public data sharing scheme in practice. This cohort combines technical expertise and years of experience in platform monitoring, data protection, and human rights. Together, we put forward a set of five recommendations to the European Commission and to the designated platforms directly as they move to implement Article 40(12) of the DSA:
- Public data should be complete, comprehensive, and include historical data.
- Data must be verifiable, for which multiple access methods are needed.
- Permissioned access must come on fair and reasonable terms.
- Platforms must not hinder independent public interest research.
- Data sharing should include a diversity of researchers.
The full recommendations can be read in our open letter.
AMO Association for International Affairs
Institute for Strategic Dialogue (ISD)
Democracy Reporting International (DRI)
Check First oy
Stiftung Neue Verantwortung (SNV)
The Forum on Information and Democracy
Algorithmic Transparency Institute, National Conference on Citizenship
The Coalition for Independent Technology Research
The Institute for Data, Democracy & Politics, George Washington University Brandon
Silverman, Former CEO & Co-Founder of CrowdTangle