Data Trusts: Why, What and How

How do we, the general public, gain greater control over the estimated 2.5 quintillion bytes of data that is recorded, stored, processed and analysed, every day? Anouk Ruhaak about the concept of data trusts as a way to claw back some control over the digital utilities that we rely on for our everyday lives.


14 November 2019


This article was first published on on 12 November 2019.

Let’s take Uber, if Uber does something you — a regular user — do not like, this isn’t something Uber views as up for discussion. Your only recourse is to delete the app. Your act of defiance is unlikely to have a large impact. If you can even afford to that is; what if Uber was your only way to get to work?

In this article I put forward the concept of data trusts as a way to claw back some control over the digital utilities that we rely on for our everyday lives. A data trust is a structure whereby data is placed under the control of a board of trustees with a fiduciary responsibility to look after the interests of the beneficiaries — you, me, society. Using them offers all of us the chance of a greater say in how our data is collected, accessed and used by others. This goes further than limiting data collecting and access to protect our privacy; it promotes the beneficial use of data, and ensures these benefits are widely felt across society

In a sense, data trusts are to the data economy what trade unions are to the labour economy.

Who’s in control?

Any inquiry into the appropriate collection and flow of data should attempt to answer these questions:

The first question acknowledges that the very act of recording data can have far-reaching consequences. For one, it’s hard to erase data once it is collected, such that collection always implies use (at a minimum, the storage of data). Secondly, the act of recording itself can be viewed as violating our autonomy. Humans behave differently when they know they’re on camera, or when we assume our everyday conversations are on the record (an ever-more reasonable assumption).

The second and third questions determine how information is used and distributed, once it is collected. In addition to determining who has access and can use data today, we need to know who can make as-yet-unspecified future decisions about future access and use. For example, you may have access to data about you, but do not enjoy the right to decide who else can access that data. Alternatively, it could be entirely up to you to decide who can use data about you, or some specific dataset, and you can revoke that use right whenever you so desire.

Clearly, the power to decide who can collect, access and use data is more important than merely holding collection, access and use rights right now. Which begs the question: who gets to make those decisions about our data? Oftentimes, the de facto answer to this question is ‘a corporation’, be it Google or Facebook or Amazon. Most of the sensors collecting data are under corporate control and most of the resulting data is held by corporations as well. Especially in jurisdictions without explicit data protection legislations, this reality has meant that corporations decide what data is collected and who can access and use the collected data, and for what purpose. Even when data is collected within the context of a public project (e.g. smart cities) it is often the consulting corporation deciding what to collect, who could access it, how it was used, and by whom — with little public oversight. That’s a problem. The director of a corporation has a fiduciary responsibility to act in the interests of their shareholders. Their job is not to ensure your privacy or to make data available for the public good, but to make money. In fact, even when a company’s shareholders decide they do want to put those values above their need to turn a profit, we cannot trust they will continue to do so in the future. What happens to their good intentions when their corporation is sold?

Privacy policies coming into force today solve part of the problem, by handing individuals the right to decide how they want to share or not share data about them, and what they allow to be collected in the first place. However, our ability to exercise these rights depends on whether those decisions are made in freedom. Unfortunately, our reliance on a handful of social media platforms and digital services have resulted in power imbalances that undermine any meaningful notion of consent. Our ability to freely choose how and when we share our data breaks down when the ‘choice’ is between surrendering data about ourselves and social exclusion, or even unemployment (as is the case when we decide to opt out of workplace surveillance). Without a real way to opt out, our consent is meaningless.

Meanwhile, the enforcement of privacy policies leaves much to be desired. Many enforcement bodies rely on complaints, instead of preemptive audits, and are severely understaffed.

In relation to the questions posed above, data protection laws give us the rights we need to grant and revoke access to and use of data. However, without addressing the underlying power imbalances we remain ill-equipped to exercise those rights.

How to level the playing field?

Three alternative solutions have been proposed to level the playing field. Some look to antitrust laws to break up Big Tech. The idea is that many smaller tech companies would allow for more choice between services. This solution is flawed. For one, services like search or social media benefit from network effects. Having large datasets to train on, means search recommendations get better. Having all your friends in one place, means you don’t need five apps to contact them all. I would argue those are all things we like and might lose when Big Tech is broken up. What we want is to be able to leave Facebook and still talk to our friends, instead of having many Facebooks. At the same time, more competition is likely to make things worse. When many services need to compete for your attention, it’s in their best interest to make those services as addictive as possible. This cannot be the desired outcome.

Instead of creating more competition, some argue we should just nationalize Big Tech. This strategy leaves us with two important questions: which government should do the nationalizing? And do we want a government in control of data about us?

Finally, we could decide to divorce those who wish to use data from those who control its use. Personal Data Stores (eg Solid, or MyData) aim to do just that. By placing the data with the internet user, rather than the service provider, they hope to put the user back in control. This approach holds a lot of merit. However, it fails to account for our limited ability to decide how we would want to share data. Do we have enough knowledge and insight to weigh our options? And even if we did, do we really want to spend our time making those decisions?

Data Trusts

As with personal data stores, by placing data in a data trust we separate the data users from those who control the data. The difference is that with a trust, we avoid placing the entire burden of decision-making on the individual. Moreover, by pooling data from various sources together in a data trust, we unlock the ability for a data trustee to negotiate on behalf of the collective, rather than an individual.

A data trust is created when someone or a lot of someones hand over their data assets or data rights to a trustee. That trustee can be a person or an organisation, who will then hold and govern that data on behalf of a group of beneficiaries and will do so for a specific purpose. The beneficiaries could be those who handed the data to the trust, or anyone else (including society at large). Importantly, the trustee has a fiduciary responsibility to look out for the interests of the beneficiary, much like your doctor has a fiduciary responsibility to do what is best for you. That also means that the trustee is not allowed to have a profit motive or, more generally, a conflicting interest in the data or data rights under its custody.

One important feature of a data trust is that the trustee can decide who has access to the data under the trust’s control and who can use it. And, importantly, if that data user fails to comply with the terms and conditions, the trustee can revoke access. To return to the Uber example, instead of you leaving Uber in protest, a trustee can threat to revoke access to the data of many. Such a threat will carry a lot more weight than the act of a single user.

The Road Ahead

How do we get from here to a world in which our data is governed by data trusts? Needless to say there is still a lot to figure out. How do trustees make decisions about data collection and access? How do we make sure we can continue to trust the trust? Are data trusts possible within our current regulatory environment and to what extent does the answer to that question depends on the jurisdiction you are in?

We will not find the answers to these and many other remaining questions just by theorizing. Instead, we need to test various models in real-world scenarios. As a Mozilla Fellow I hope to contribute to this effort by considering the usefulness of a data trust for two specific scenarios:

  1. Data Donation Platform: AlgorithmWatch is looking to build a data donation platform that allows users of browsers to donate data on their usage of specific services (eg Youtube, or Facebook) to a platform. That data is then employed to understand how users are targeted by those platforms, or what ads they are being served. Could this data sit in a trust? Who would the trustee be? Who would we want to access and use this data?
  2. Health data: CoverUS, a US-based health startup is looking to help its members to collect health data about them and use it to gain better access to health services. We want to find out whether a data trust could hold and govern this data.

It is my hope that by studying the concept of a data trust in these specific contexts I will learn more about the incentives and constraints of the various pilot partners for participating in a trust and gain a better understanding of the design requirements for a data trust. I further hope to obtain more insights in the regulatory and policy requirements for data trusts to work.

How you can help

The success of this project will depend in large part on the support from others like you! If you have experience setting up governance structures, building legal trusts, thinking through economic models or building data pipelines and would like to help, then please reach out! You can find me on twitter as @anoukruhaak (DMs open!) or send me an email.