On 5 February 2021, Daniel Koerhuis posted a selfie from the banks of the Waal river in Nijmegen, with children in the background. This was not his first shot. In the past year, Mr Koerhuis posted eleven “selfies with children” on Instagram. The other 13 posts he shared all show pictures of himself, mostly selfies. None mention a political program.
From his Instagram, one would not guess that Mr Koerhuis is a politician in the run up to a major election. Already a member of the Dutch parliament, he is on position 19 on the list of the ruling VVD party (polls currently show VVD gaining 40 seats in the election, ensuring a win for Mr Koerhuis).
Perhaps Mr Koerhuis knows that posts about politics, or at least posts that do not show human faces, do not perform well on Instagram. Not only because they bring in less likes, but because Instagram is likely to show them less often to users.
675 data donors
AlgorithmWatch conducted a large experiment starting in mid-2020. Together with Pointer, NOS and De Groene Amsterdammer, three Dutch media of record, and with funding from SIDN foundation, we asked volunteers to install a browser add-on that scans their Instagram newsfeeds at regular intervals. Each data donor was told to follow three accounts of Dutch politicians or political parties.
We recorded what politicians posted on Instagram on the one hand. On the other, we recorded what volunteers saw at the top of their newsfeed. This way, we could see when a volunteer encountered a post by a politician – and when not.
In total, 675 volunteers contributed 38,259 data donations over eight months. The 124 politicians we monitored posted 7,477 image posts on Instagram (we did not record video posts or stories, short videos that usually remain online 24 hours).
Faces, not text
The pictures politicians posted were automatically sorted in different categories. As was to be expected, some categories performed better than others. A post showing a face had a 32% chance of appearing on the newsfeeds of our volunteers, while posts displaying text only had a 30% chance of reaching them.
Of course, there are many reasons why Instagram decides to display a post to a user. Posts that have been heavily liked or commented are more likely to be shown. So are posts that have just been posted. A user who follows many accounts will only see a fraction of the posts from each, there is only so much room at the top of the newsfeed. The time of the day and the day of the week a post was published on also play a role.
However, even when all these factors are accounted for, Instagram still seems to favor some type of content over the rest. Pictures depicting people in business attire or faces were more likely to appear in the newsfeeds of our volunteers. On the contrary, pictures representing text were less likely to be shown. The detailed analysis can be accessed on GitHub.
Mr Koerhuis’ posts appeared more often in the newsfeeds of our volunteers than could be explained just by their popularity. This might be because the type of pictures he publishes, selfies and children, is what Instagram favors.
These findings confirm those of an investigation by AlgorithmWatch published in June 2020. It showed that professional content creators were more likely to be seen by their followers when they posted pictures of scantily-clad bodies. We replicated these findings in a subsequent experiment, taking into account many more parameters than in the original one (the code of the newer experiment is available on GitHub).
Instagram’s help center states that posts are shown in the newsfeed based on several factors, which may include the “likelihood [a user will] be interested in the content”. Instagram does not explain how it decides what is of interest to a user.
Our results show that “interest” is unlikely to be based purely on a user’s behavior. If it were the case, we would expect more variety in individual preferences, translating into a much weaker effect on aggregate. Some users might show no interest in natural landscapes, for instance, but it is likely that others do, thus balancing out the overall result.
Our data might reflect the collective preferences of our volunteers (in order to minimize the risk to their privacy, we did not collect any information on them). However, many of the volunteers were new to Instagram or not very active, giving the platform little information from which to guess their interests. More importantly, the behavior of the algorithm remained stable even as our pool of volunteers changed. (If the hierarchy of posts shown in the newsfeed reflected purely user interest, we would have expected our data to change dramatically in November, when hundreds of Dutch volunteers joined. This did not happen.)
A post is much more likely to garner likes and comments if it is shown at the top of a user’s newsfeed. Because Instagram decides what users should be interested in, the platform indirectly decides what becomes popular.
Posts that contain text are less popular on Instagram not only because they are less liked, but because probably because Instagram pushes them down. When politicians post pictures containing lots of text, for instance to detail a policy issue, they start with a disadvantage.
The limitations of data donations
Our experiment is probably the largest of its kind. Instagram is vastly under-studied by academics, mostly because of a lack of access to data. However, relying on data donations comes with many limitations.
To protect the privacy of our volunteers, we collected no information on them. We do not know where they live, or what their political orientations are. It is likely that our sample is biased in many ways. Instagram’s algorithm might produce different results on other groups of users.
Secondly, many volunteers stopped contributing as time went on. We had 567 active volunteers in November, when NOS, Pointer and Groene Amsterdammer encouraged their readers to participate. Only 33 were still contributing in February. One reason for their quitting the project could be that they had to follow politicians which they personally disliked.
Perhaps more damaging for our investigation, Instagram users can mute accounts. This feature is only available on the mobile app, not on the web version where our data collection was taking place. As a result, we do not know whether a politician does not appear in the newsfeed because they were muted or because Instagram does not show their posts. While this problem does not affect the general results, it makes it impossible to look in detail at how Instagram treats the most controversial politicians in our sample.
Another limitation has to do with the labeling of the images. We used an external service to automatically label the thousands of images posted by politicians. In order to validate the results, we manually labeled over 1,300 posts. The main findings, that images of people are pushed up and images of text are pushed down, were confirmed. But some discrepancies are puzzling. Pictures of food are shown to be pushed down in the automated analysis, but the 32 posts depicting food that we manually labeled tended to be pushed up. We could not find an explanation for the difference, bar luck.
Our experiment demonstrates one thing with certainty: that we need more information about large platforms’ algorithms, and that independent researchers will not be able to provide it. Only regulators will be able to open the black boxes.
Facebook, which owns Instagram, was sent our analysis on 22 February. They did not answer our questions but sent this statement, which we quote in full:
We reviewed AlgorithmWatch’s report and found a number of issues with their methodology. The report fundamentally misunderstands how Instagram works and fails to emphasize that people see posts from the accounts they choose to interact with. If a politician or any account has more engagement, it may be because they are sharing more posts their followers are engaging with. Everyone’s Instagram Feed is unique to them and is based on things such as who they follow, who they interact with the most, and the last time they posted. Despite all of this, we go even further to deeply study algorithmic fairness and work with academics and other experts to help keep bias out of our algorithms.