Under the Twitter streetlight: How data scarcity distorts research

As part of our #LeftOnRead campaign, several researchers testified to the reluctance of online platforms to provide useful data. Many resort to studying Twitter, which is more accommodating than most.

Nicolas Kayser-Bril
Head of Journalism (on parental leave)

Tiziano Bonini, an associate professor at the University of Siena, began an ethnographic investigation of online music platforms in late 2017, together with his colleague Alessandro Gandini of the University of Milan. As the discovery and consumption of cultural artifacts, including songs, becomes automated, they wanted to find out how Spotify, Apple Music and other services operated, from their recommendation algorithms to their work practices.

Like many other researchers, they were “left on read”. Their formal requests to interview developers and playlist curators were either denied or not answered. They circumvented the problem by talking with other professionals in the music industry and published their findings in 2019.

Mr Bonini and Mr Gandini’s story is very common among scholars of online platforms. Few companies allow them to access data, and we are not aware that any allows “participant observation”, a method where researchers stay on the organization’s premises for weeks or months at a time.

Streetlight effect

Facebook, Twitter and YouTube are among the few services that share some data with researchers. Facebook sometimes gives access to Crowdtangle, a tool to extract posts from public groups. YouTube lets some researchers use its application programming interface (API), while Twitter has the most forthcoming data access regime of all platforms.

This differential access regime produces a “streetlight effect”, “an observational bias that occurs when people only search for something where it is easiest to look,” according to Wikipedia.

Close to nine in ten academic articles published about social networks focused on Facebook, Twitter or YouTube. While precise usage data on online platforms is not available, it is very unlikely that these three platforms, especially Twitter, deserve such a share of the academic attention.

External content from datawrapper.com

We'd like to present you content that is not hosted on our servers.

Once you provide your consent, the content will be loaded from external servers. Please understand that the third party may then process data from your browser. Additionally, information may be stored on your device, such as cookies. For further details on this, please refer directly to the third party.

A review of academic articles published by major journals (from the databases of Sage Publishing, Taylor & Francis and Elsevier) shows that Twitter is largely over-represented. (The review is far from comprehensive and limited to English-language sources, but the effect is large and can be seen in databases in France and Latin America as well. Our data is open.)

Close to 2,300 articles containing “Twitter” in their titles have been published, against 53 for Snapchat. Twitter has about 150 million daily active users, half the amount of Snapchat, which does not report monthly users.

Out of the 17 social networks we searched for, Twitter studies represented a third of the academic output. Twitter might be much older than other services (it was created in 2006, long before TikTok or WeChat), but its popularity among scholars remains remarkably stable. The share of articles mentioning Twitter in their titles went from 34.2% (of all articles containing the names of one of the platforms in their titles) in the years up to 2018 to 33.9% in 2019 and 2020.

Badoo, one of the largest dating services in the world, has not been featured in the title of a single academic article available in the databases we researched.

Media appetite

Mr Bonini of the University of Siena said that media scholars are especially interested in political communication, which could explain the over-representation of Twitter and Facebook. But other platforms that play a major role in political communication remain under-researched. Instagram, which is notoriously difficult to research, has been described as “more of a disinformation magnet than Facebook, Twitter, or YouTube” in the context of the upcoming US election in a 2019 report by New York University’s Stern School of Business.

Similarly, 4chan, a message board where much of the alt-right discourse originated in the early 2010’s, was the subject of only six articles in the databases we searched.

Mr Bonini offered another possible explanation. “The media are more likely to cite a new paper on political communication on Twitter or Facebook,” he told AlgorithmWatch. Perhaps because journalists and scholars themselves favor these platforms over Snapchat or TikTok.

Participant observation

Inês Narciso, an assistant professor at the University Institute of Lisbon who was left on read by Facebook in a recent research project, points out the lack of technical skills among some social scientists. More cooperation with technical departments could improve the variety of research methods available to them.

She also mentioned that boards of ethics should update their rules on participant observation to better reflect the nature of social networks where there is no expectation of privacy. In one case, Ms Narciso planned to join large WhatsApp groups to study disinformation, but had to renounce after colleagues voiced ethical concerns (she had to use data donations from individual users instead).

As almost all platforms prevent researchers from accessing data, participant observation might become the best way to investigate. Nick Seaver, an assistant professor at Tufts University, described algorithms as “people”, who should be studied with the tools of ethnography. While researching his PhD on music recommendation services, he managed to get an internship at a large music streaming company.

Technography

Ana Pop Stefanija, a doctoral candidate at the Free University of Brussels-VUB, studies how the algorithms of online platforms make extensive profiles of their users and how they influence their users' interactions and behavior. She relies on “technography”, a portmanteau of technology and ethnography that describes the application of ethnography to technical systems.

In her research, Ms Pop Stefanija relies on data requests by users (under article 15 of the General Data Protection Regulation), among other sources. Researchers should not rely on data from platforms only, she told AlgorithmWatch. Platforms have such overwhelming power (discontinuing a data access point, for instance) that, in the end, they are the ones who decide what researchers are allowed to know.

Did you like this story?

Every two weeks, our newsletter Automated Society delves into the unreported ways automated systems affect society and the world around you. Subscribe now to receive the next issue in your inbox!

Get the briefing on how automated systems impact real people, in Europe and beyond, every two weeks, for free.

For more detailed information, please refer to our privacy policy.