“We must save privacy from privacy itself”

by Michele Loi

Blog

31 March 2020

#covid19 #oped #privacy #publicsector #publicsphere

Michele Loi is a post-doctoral researcher at the University of Zurich. He argues that proponents of privacy should not put privacy above health – else risk sliding into irrelevance.

Not once I have seen a greater threat to privacy than the current pandemic of coronavirus. I am presently a pessimist about privacy: I think that privacy risks disappearing from the realm of values to which decent people are committed. We must act wisely and cleverly, and we must mobilize now.

The current threat to privacy has a Janus face: its two faces are each other’s mirror image. One of the two faces is widely know: fear provides governments with a justification to use intrusive means of digital surveillance populations would never have accepted before. We may call it the risk of our societies becoming more like China’s. And yet, this is not the only way for privacy to disappear.

Too much respect for privacy

The second risk for privacy is paradoxical: it follows from having “too much respect” for privacy itself. If our concern with privacy prevents urgent political action for the sake of public health surveillance, people will blame additional deaths on privacy itself. People will stop valuing privacy as a good, and any commitment to its preservation will become unfashionable. Imagine that Europe survives the pandemic with ten times the number of deaths as China. It will be easy for the enemies of privacy to claim “that’s how much privacy costs in human lives”. Defenders of privacy will become few and unpopular.

To sum up: we need to rescue privacy from non-trustworthy governments. Public health surveillance provides the perfect excuse for governments to collect data to abuse, and that’s a clear risk. But, less evidently, we also need to rescue privacy from itself. If our societies do not collect the relevant data fast enough and do not use it intelligently, and if privacy is used as the excuse not to collect such data, it will be privacy as a human value that risks being washed out from the face of the Earth.

A century ago, fascism emerged in opposition to liberalism for a reason, namely the inability of liberalism to deal with the economic challenges of the time. In this century, something similar may happen. A new kind of fascism, centred on digital surveillance, may emerge in opposition to privacy if people start believing that privacy is what prevented politicians to take public health measures that are adequate to prevent the spread of the disease.

Algorithms in the fight against COVID-19

Even if we achieve the best possible balance between the privacy needs of citizens and the need for data to fight COVID-19, we are still facing the threat of a use of data that is not trustworthy, legitimate, and respectful of human dignity.

Legitimacy, let us assume, can be achieved if all players gaining access to data only use it for public health purposes. This is the first condition that should be required of the algorithms that are used to process such data: they should only be used for purposes of health promotion and not, let us say, to prevent tax fraud, burglary, or other crimes, no matter how high the temptation to do so.

One reason for this is moral: protecting the health of the population is the only thing that morally justifies a collection of personal data as invasive as the one we must – as citizens – be ready to accept and support. It is only for this reason that citizens are (hopefully) willing to allow public authorities to gain access to information about the most intimate details of their lives, such as detailed geolocation data which would help greatly in “contact tracing”.

The second reason is pragmatic: if individuals fear that data collected about their lives will be used against them – in ways that cannot be justified by the immediate protection of the health of others – they may prefer to leave their smartphones and other devices at home.

Trustworthiness and respect for human persons

The second condition, trustworthiness and respect for human persons, is harder to translate into operational directives. So far, attempts by epidemiologists to model the spread of the disease based on our imperfect knowledge of the disease have been largely unsatisfactory. The errors from traditional approaches suggest that a strategy using big data (including data from self-tracking projects, involving temperature and movement sensors on smartphones, for example) and machine learning techniques might work better. However, these are opaque – some call them “black boxes” – unless explicitly designed to be otherwise. We are not talking here about having hypotheses about how a disease spreads, first, and then using machine learning to see if the data fit such hypotheses and to fine-tune its parameters. We are talking about a computer blindly following some general-purpose mathematical strategy until it finds a mathematical operation that, when applied to the data produced by the contributing citizens, produces a decent output (e.g. a good proportion of correct prediction).

The mathematical operation in question can be so complex that not a single mathematician on Earth can understand what it means. And it will not – unless explicitly designed otherwise – be of any help in understanding the social or natural mechanism behind the spread of the disease. This is known in the literature as the problem of AI opacity (or AI black boxes), and has led to the birth of a new field of computer science, that of “explainable AI” (X-AI).

Avoiding black-box AI through interpretable models

We foresee that black-box AI will raise significant ethical questions about its deployment, especially if it is used as a basis for public health choices including selective lock-downs (when normal economic life restarts, lock-downs will become more targeted) and decisions about whom to keep in isolation.

It will be hard for citizens to trust decisions based on predictions where no one can explain the reason why certain people must suffer more than others. Such deployment of algorithms might also be considered to be disrespectful of the dignity of the affected people.

This is why trustworthiness and the principle of respect for persons may call for the deployment of X-AI. There are roughly two approaches here: the first is to constrain the machine learning method to force it to produce a mathematical model to which epidemiologists can attach a clear meaning. In this case, a model’s outcome would not only help deciding over practical decisions about public health measures (based on probabilistic considerations), but also suggest new hypotheses which may lead to an improved understanding of the disease, and of the models of the disease, and their shortcomings. The resulting models could be debated in an interdisciplinary perspective and possible errors and biases would be easier to find. This kind of X-AI would help make the public health justification of those measures explicit. Explainable AI could help model the risk of contagion of specific social types, based on data collected about the way different kinds of people interact.

Trusting black box models

The other approach is to use black box AI combined with X-AI methods that, roughly, turn a black box algorithm into one that is understandable. However, these methods are far from perfect: the black box algorithms are not made fully transparent. Rather, we build a simpler approximation of the black box model that we can explain. Another approach, called “counterfactual explanation”, presupposes that a prediction or decision has already been made about a person or group. If a decision to lock down a part of a city were taken, the counterfactual explanation would be an answer to a question such as “what should be different about this part of the city for it not to be affected by a lock down?”

The problem with such methods of explanation is that, as the recent literature shows, they can be manipulated at will by the people writing them, and even by the people developing the algorithm to be explained. For example, it is possible to cherry pick a counterfactual explanation model to hide the role of some factors and put others under the spotlight. We also need further control mechanisms if we want to ensure that this type of X-AI will not be used to fool the public and that the explanation provided with such methods should be trusted. This is best achieved by giving machine learning researchers as little incentives to fool the public as possible. And it will call for further methodological studies in the field of X-AI, to disclose ingenious attempts to manipulate explanations.

Finally, the algorithms developed based on the data will not be respectful of human dignity if they lead to the stigmatization of individuals and entire groups. We wish to place emphasis on the questions concerning human groups and not individuals. The reason for this emphasis is that privacy advocates and policy makers may be fooled into thinking that apps based on such data raise no problem for privacy as long as the data is anonymized. This is a very dangerous misconception.

Avoiding stigmatization

Consider an algorithm that computes the risk factors associated with social types based on aggregate (anonymous) data collected about how people interact, collected from their smartphones. The model may end up associating higher risk scores to certain categories of people, e.g. food delivery employees or Uber drivers. We know from past epidemics that when a social type gets associated with a high risk of contagion it is stigmatized and suffers the additional harm of isolation. In the past, this has affected even nurses who were discriminated as potential spreaders. While it is important to learn about the social causes of the spread of the virus, we must prevent the harmful consequences of stigmatization on specific groups of people, especially those who already suffer more than others because the coronavirus.

We cannot give up our humanity and sacrifice the dignity of entire groups of people for the sake of generating a marginal increase in the improvement of the mechanics of fighting the disease. This means that the data scientists developing the models must constantly ask themselves questions about the variables they define (e.g. should I describe a social type in this or that way) and how they communicate their findings.

It also means that just as we use data science to understand the drivers of the risk of contagion, we should use equally powerful methods to identify the drivers or stigmatization and those conditions (e.g. hypothetically, being an immigrant) increasing the risk of social exclusion. We must also try to predict those risks and plan to prevent or mitigate their harmful effects. Otherwise the use of machine learning and artificial intelligence to help in the fight against disease will be unethical.

Where does this leave us? Yes, your privacy is at risk but not just because of governments are hungry for your data. It is also at risk because of inaction, for which privacy (and its advocates) will become the scapegoat, if we are too late implementing effective data collection and analysis. Those who care about privacy do not want privacy to be perceived as the enemy of knowledge. Machine learning can be opaque and lead to apparently arbitrary decisions, but it can also help us to greatly improve public health measures, something which may enable you to more quickly regain some freedoms that have now been curtailed. The main ethical challenges of such systems have been mapped, so we can develop specific ethical requirements (e.g. with respect to X-AI) and hold technology developers accountable for them. Citizens should be empowered to enjoy their human right to participate in science, from each according to their ability, and in an ethically appropriate way. How we act now will decide if European citizens will get used to being passive recipients of surveillance and state-driven decisions, or active contributors to a new, ethically compliant, citizen-centered ecosystem of data and algorithms promoting their own well-being.

Michele Loi, Ph.D. is Senior Researcher at the Institute of Biomedical Ethics and the History of Medicine, and Research Fellow at the Digital Society Initiative, University of Zurich. In 2015–2016, as a consultant, he coordinated the writing of the World Health Organization Guidance For Managing Ethical Issues In Infectious Disease Outbreaks and helped drafting the World Health Organization Guidelines on Ethical Issues in Public Health Surveillance.

Michele Loi (he/him)

Senior Research Advisor

Photo: Julia Bornkessel

Italian, English, German

loi@algorithmwatch.org

+39 320 0273272

6 Articles by Michele Loi

Michele Loi, Ph.D., is Marie Sklowdoska-Curie Individual Fellow at the Department of Mathematics of the Politecnico Milan with a research project on Fair Predictions in Health. He is also co-principal investigator of the interdisciplinary project Socially Acceptable and Fair Algorithms, funded by the Swiss National Science Foundation, and has been Principal Investigator of the project "Algorithmic Fairness: Development of a methodology for controlling and minimizing algorithmic bias in data based decision making", funded by the Swiss Innovation Agency.