Under the Twitter streetlight: How data scarcity distorts research

As part of our #LeftOnRead campaign, several researchers testified to the reluctance of online platforms to provide useful data. Many resort to studying Twitter, which is more accommodating than most.


13 August 2020


Tiziano Bonini, an associate professor at the University of Siena, began an ethnographic investigation of online music platforms in late 2017, together with his colleague Alessandro Gandini of the University of Milan. As the discovery and consumption of cultural artifacts, including songs, becomes automated, they wanted to find out how Spotify, Apple Music and other services operated, from their recommendation algorithms to their work practices.

Like many other researchers, they were “left on read”. Their formal requests to interview developers and playlist curators were either denied or not answered. They circumvented the problem by talking with other professionals in the music industry and published their findings in 2019.

Mr Bonini and Mr Gandini’s story is very common among scholars of online platforms. Few companies allow them to access data, and we are not aware that any allows “participant observation”, a method where researchers stay on the organization’s premises for weeks or months at a time.

Streetlight effect

Facebook, Twitter and YouTube are among the few services that share some data with researchers. Facebook sometimes gives access to Crowdtangle, a tool to extract posts from public groups. YouTube lets some researchers use its application programming interface (API), while Twitter has the most forthcoming data access regime of all platforms.

This differential access regime produces a “streetlight effect”, “an observational bias that occurs when people only search for something where it is easiest to look,” according to Wikipedia.

Close to nine in ten academic articles published about social networks focused on Facebook, Twitter or YouTube. While precise usage data on online platforms is not available, it is very unlikely that these three platforms, especially Twitter, deserve such a share of the academic attention.

A review of academic articles published by major journals (from the databases of Sage Publishing, Taylor & Francis and Elsevier) shows that Twitter is largely over-represented. (The review is far from comprehensive and limited to English-language sources, but the effect is large and can be seen in databases in France and Latin America as well. Our data is open.)

Close to 2,300 articles containing “Twitter” in their titles have been published, against 53 for Snapchat. Twitter has about 150 million daily active users, half the amount of Snapchat, which does not report monthly users.

Out of the 17 social networks we searched for, Twitter studies represented a third of the academic output. Twitter might be much older than other services (it was created in 2006, long before TikTok or WeChat), but its popularity among scholars remains remarkably stable. The share of articles mentioning Twitter in their titles went from 34.2% (of all articles containing the names of one of the platforms in their titles) in the years up to 2018 to 33.9% in 2019 and 2020.

Badoo, one of the largest dating services in the world, has not been featured in the title of a single academic article available in the databases we researched.

Media appetite

Mr Bonini of the University of Siena said that media scholars are especially interested in political communication, which could explain the over-representation of Twitter and Facebook. But other platforms that play a major role in political communication remain under-researched. Instagram, which is notoriously difficult to research, has been described as “more of a disinformation magnet than Facebook, Twitter, or YouTube” in the context of the upcoming US election in a 2019 report by New York University’s Stern School of Business.

Similarly, 4chan, a message board where much of the alt-right discourse originated in the early 2010’s, was the subject of only six articles in the databases we searched.

Mr Bonini offered another possible explanation. “The media are more likely to cite a new paper on political communication on Twitter or Facebook,” he told AlgorithmWatch. Perhaps because journalists and scholars themselves favor these platforms over Snapchat or TikTok.

Participant observation

Inês Narciso, an assistant professor at the University Institute of Lisbon who was left on read by Facebook in a recent research project, points out the lack of technical skills among some social scientists. More cooperation with technical departments could improve the variety of research methods available to them.

She also mentioned that boards of ethics should update their rules on participant observation to better reflect the nature of social networks where there is no expectation of privacy. In one case, Ms Narciso planned to join large WhatsApp groups to study disinformation, but had to renounce after colleagues voiced ethical concerns (she had to use data donations from individual users instead).

As almost all platforms prevent researchers from accessing data, participant observation might become the best way to investigate. Nick Seaver, an assistant professor at Tufts University, described algorithms as “people”, who should be studied with the tools of ethnography. While researching his PhD on music recommendation services, he managed to get an internship at a large music streaming company.


Ana Pop Stefanija, a doctoral candidate at the Free University of Brussels-VUB, studies how the algorithms of online platforms make extensive profiles of their users and how they influence their users' interactions and behavior. She relies on “technography”, a portmanteau of technology and ethnography that describes the application of ethnography to technical systems.

In her research, Ms Pop Stefanija relies on data requests by users (under article 15 of the General Data Protection Regulation), among other sources. Researchers should not rely on data from platforms only, she told AlgorithmWatch. Platforms have such overwhelming power (discontinuing a data access point, for instance) that, in the end, they are the ones who decide what researchers are allowed to know.

Nicolas Kayser-Bril


Photo: Julia Bornkessel, CC BY 4.0
Nicolas is data journalist and working for AlgorithmWatch as a reporter. He pioneered new forms of journalism in France and in Europe and is one of the leading experts on data journalism. He regularly speaks at international conferences, teaches journalism in French journalism schools and gives training sessions in newsrooms. A self-taught journalist and developer (and a graduate in Economics), he started by doing small interactive, data-driven applications for Le Monde in Paris in 2009. He then built the data journalism team at OWNI in 2010 before co-founding and managed Journalism++ from 2011 to 2017. Nicolas is also one of the main contributors to the Datajournalism Handbook, the reference book for the popularization of data journalism worldwide.