op-ed

“We must save privacy from privacy itself”

Michele Loi is a post-doctoral researcher at the University of Zurich. He argues that proponents of privacy should not put privacy above health – else risk sliding into irrelevance.

Not once I have seen a greater threat to privacy than the current pandemic of coronavirus. I am presently a pessimist about privacy: I think that privacy risks disappearing from the realm of values to which decent people are committed. We must act wisely and cleverly, and we must mobilize now.

The current threat to privacy has a Janus face: its two faces are each other’s mirror image. One of the two faces is widely know: fear provides governments with a justification to use intrusive means of digital surveillance populations would never have accepted before. We may call it the risk of our societies becoming more like China’s. And yet, this is not the only way for privacy to disappear.

Too much respect for privacy

The second risk for privacy is paradoxical: it follows from having “too much respect” for privacy itself. If our concern with privacy prevents urgent political action for the sake of public health surveillance, people will blame additional deaths on privacy itself. People will stop valuing privacy as a good, and any commitment to its preservation will become unfashionable. Imagine that Europe survives the pandemic with ten times the number of deaths as China. It will be easy for the enemies of privacy to claim “that’s how much privacy costs in human lives”. Defenders of privacy will become few and unpopular.

To sum up: we need to rescue privacy from non-trustworthy governments. Public health surveillance provides the perfect excuse for governments to collect data to abuse, and that’s a clear risk. But, less evidently, we also need to rescue privacy from itself. If our societies do not collect the relevant data fast enough and do not use it intelligently, and if privacy is used as the excuse not to collect such data, it will be privacy as a human value that risks being washed out from the face of the Earth.

A century ago, fascism emerged in opposition to liberalism for a reason, namely the inability of liberalism to deal with the economic challenges of the time. In this century, something similar may happen. A new kind of fascism, centred on digital surveillance, may emerge in opposition to privacy if people start believing that privacy is what prevented politicians to take public health measures that are adequate to prevent the spread of the disease.

Algorithms in the fight against COVID-19

Even if we achieve the best possible balance between the privacy needs of citizens and the need for data to fight COVID-19, we are still facing the threat of a use of data that is not trustworthy, legitimate, and respectful of human dignity.

Legitimacy, let us assume, can be achieved if all players gaining access to data only use it for public health purposes. This is the first condition that should be required of the algorithms that are used to process such data: they should only be used for purposes of health promotion and not, let us say, to prevent tax fraud, burglary, or other crimes, no matter how high the temptation to do so.

One reason for this is moral: protecting the health of the population is the only thing that morally justifies a collection of personal data as invasive as the one we must – as citizens – be ready to accept and support. It is only for this reason that citizens are (hopefully) willing to allow public authorities to gain access to information about the most intimate details of their lives, such as detailed geolocation data which would help greatly in “contact tracing”.

The second reason is pragmatic: if individuals fear that data collected about their lives will be used against them – in ways that cannot be justified by the immediate protection of the health of others – they may prefer to leave their smartphones and other devices at home.

Trustworthiness and respect for human persons

The second condition, trustworthiness and respect for human persons, is harder to translate into operational directives. So far, attempts by epidemiologists to model the spread of the disease based on our imperfect knowledge of the disease have been largely unsatisfactory. The errors from traditional approaches suggest that a strategy using big data (including data from self-tracking projects, involving temperature and movement sensors on smartphones, for example) and machine learning techniques might work better. However, these are opaque – some call them “black boxes” – unless explicitly designed to be otherwise. We are not talking here about having hypotheses about how a disease spreads, first, and then using machine learning to see if the data fit such hypotheses and to fine-tune its parameters. We are talking about a computer blindly following some general-purpose mathematical strategy until it finds a mathematical operation that, when applied to the data produced by the contributing citizens, produces a decent output (e.g. a good proportion of correct prediction).

The mathematical operation in question can be so complex that not a single mathematician on Earth can understand what it means. And it will not – unless explicitly designed otherwise – be of any help in understanding the social or natural mechanism behind the spread of the disease. This is known in the literature as the problem of AI opacity (or AI black boxes), and has led to the birth of a new field of computer science, that of “explainable AI” (X-AI).

Avoiding black-box AI through interpretable models

We foresee that black-box AI will raise significant ethical questions about its deployment, especially if it is used as a basis for public health choices including selective lock-downs (when normal economic life restarts, lock-downs will become more targeted) and decisions about whom to keep in isolation.

It will be hard for citizens to trust decisions based on predictions where no one can explain the reason why certain people must suffer more than others. Such deployment of algorithms might also be considered to be disrespectful of the dignity of the affected people.

This is why trustworthiness and the principle of respect for persons may call for the deployment of X-AI. There are roughly two approaches here: the first is to constrain the machine learning method to force it to produce a mathematical model to which epidemiologists can attach a clear meaning. In this case, a model’s outcome would not only help deciding over practical decisions about public health measures (based on probabilistic considerations), but also suggest new hypotheses which may lead to an improved understanding of the disease, and of the models of the disease, and their shortcomings. The resulting models could be debated in an interdisciplinary perspective and possible errors and biases would be easier to find. This kind of X-AI would help make the public health justification of those measures explicit. Explainable AI could help model the risk of contagion of specific social types, based on data collected about the way different kinds of people interact.

Trusting black box models

The other approach is to use black box AI combined with X-AI methods that, roughly, turn a black box algorithm into one that is understandable. However, these methods are far from perfect: the black box algorithms are not made fully transparent. Rather, we build a simpler approximation of the black box model that we can explain. Another approach, called “counterfactual explanation”, presupposes that a prediction or decision has already been made about a person or group. If a decision to lock down a part of a city were taken, the counterfactual explanation would be an answer to a question such as “what should be different about this part of the city for it not to be affected by a lock down?”

The problem with such methods of explanation is that, as the recent literature shows, they can be manipulated at will by the people writing them, and even by the people developing the algorithm to be explained. For example, it is possible to cherry pick a counterfactual explanation model to hide the role of some factors and put others under the spotlight. We also need further control mechanisms if we want to ensure that this type of X-AI will not be used to fool the public and that the explanation provided with such methods should be trusted. This is best achieved by giving machine learning researchers as little incentives to fool the public as possible. And it will call for further methodological studies in the field of X-AI, to disclose ingenious attempts to manipulate explanations.

Finally, the algorithms developed based on the data will not be respectful of human dignity if they lead to the stigmatization of individuals and entire groups. We wish to place emphasis on the questions concerning human groups and not individuals. The reason for this emphasis is that privacy advocates and policy makers may be fooled into thinking that apps based on such data raise no problem for privacy as long as the data is anonymized. This is a very dangerous misconception.

Avoiding stigmatization

Consider an algorithm that computes the risk factors associated with social types based on aggregate (anonymous) data collected about how people interact, collected from their smartphones. The model may end up associating higher risk scores to certain categories of people, e.g. food delivery employees or Uber drivers. We know from past epidemics that when a social type gets associated with a high risk of contagion it is stigmatized and suffers the additional harm of isolation. In the past, this has affected even nurses who were discriminated as potential spreaders. While it is important to learn about the social causes of the spread of the virus, we must prevent the harmful consequences of stigmatization on specific groups of people, especially those who already suffer more than others because the coronavirus.

We cannot give up our humanity and sacrifice the dignity of entire groups of people for the sake of generating a marginal increase in the improvement of the mechanics of fighting the disease. This means that the data scientists developing the models must constantly ask themselves questions about the variables they define (e.g. should I describe a social type in this or that way) and how they communicate their findings.

It also means that just as we use data science to understand the drivers of the risk of contagion, we should use equally powerful methods to identify the drivers or stigmatization and those conditions (e.g. hypothetically, being an immigrant) increasing the risk of social exclusion. We must also try to predict those risks and plan to prevent or mitigate their harmful effects. Otherwise the use of machine learning and artificial intelligence to help in the fight against disease will be unethical.

Where does this leave us? Yes, your privacy is at risk but not just because of governments are hungry for your data. It is also at risk because of inaction, for which privacy (and its advocates) will become the scapegoat, if we are too late implementing effective data collection and analysis. Those who care about privacy do not want privacy to be perceived as the enemy of knowledge. Machine learning can be opaque and lead to apparently arbitrary decisions, but it can also help us to greatly improve public health measures, something which may enable you to more quickly regain some freedoms that have now been curtailed. The main ethical challenges of such systems have been mapped, so we can develop specific ethical requirements (e.g. with respect to X-AI) and hold technology developers accountable for them. Citizens should be empowered to enjoy their human right to participate in science, from each according to their ability, and in an ethically appropriate way. How we act now will decide if European citizens will get used to being passive recipients of surveillance and state-driven decisions, or active contributors to a new, ethically compliant, citizen-centered ecosystem of data and algorithms promoting their own well-being.

Michele Loi, Ph.D. is Senior Researcher at the Institute of Biomedical Ethics and the History of Medicine, and Research Fellow at the Digital Society Initiative, University of Zurich. In 2015–2016, as a consultant, he coordinated the writing of the World Health Organization Guidance For Managing Ethical Issues In Infectious Disease Outbreaks and helped drafting the World Health Organization Guidelines on Ethical Issues in Public Health Surveillance.


Published: March 31, 2020
Category: op-ed

4 Replies to ““We must save privacy from privacy itself””

  1. Privacy cannot be bent without being destroyed

    However much I sympathize with Algorithm Watch, I find this argument by Michele Loi somewhat misguided, even dangerous. Privacy is not only a citizen’s right, enshrined in the constitution of most western countries. It is also, and even more important, a cultural norm inherited from the ancient Greek and the medieval city states, and it is very much at the heart of our democratic societies. We are not, everybody, always in the same state – sometimes we are public, sometimes we are private. If elected politicians were not able from time to time to withdraw from the public sphere, they would be bound to repeat themselves until death. Without a private sphere – a space to discuss with yourself and hence make new reflections – all dynamism will be taken out of the public sphere.

    This is exactly what has been happening during the last 30-40 years: The quest for “infotainment” on television, on social media and even in the printed press has done everything to publish the private and hence also privatize the public realm. A development very much driven by the unfettered market forces making their way to the traditional scenes of the once indispensable common debates on the future of our society.

    Many journalists writing about these issues confuse privacy and intimacy. Historical memory is short, very short, and some of my young students really do not know the concept of privacy; they do not understand the western distinction between the public and the private. So, in Denmark at least, we’re now in a situation where only a tiny minority express disagreement with the impudent, oftentimes cocky juggling with data demonstrated by our authorities.

    Data has become kind of a gold diggers land, and this is especially true of health data. Therefore, we should listen carefully to the warnings of Edward Snowden not to collect and store too many data at the same place. Unlike gold, data can never be protected or “secure”, unless they lose their essence as data.

    Discussing the purpose of a certain data collecting effort as a condition of its legitimacy is very much beside the point. Of course, the purpose is noble. It always is. But we also know from both history and personal experiences (I guess) that “the road to hell is paved with good intentions” (Samuel Johnson). In ethical assessment of technology, one has instead to look for: 1) the possibilities of other uses than originally intended (misuse); 2) the risks ignored (by necessity) of the researchers; 3) the long-term consequences for society and culture. On all three parameters gathering of health-data comes out with a dark-red light!

    It is perfectly understandable that each and every research discipline would like to prove itself useful in the fight against COVID-19, also probability theory and software engineers. It is also understandable that Michele Loi would like to “model the risk of contagion of specific social types, based on data collected about the way different kinds of people interact”.

    Such modelling however, would be ethically improper, even reprehensible. Moreover, it would not help a bit in the fight against virus-spreading.

    To mitigate ethical concerns Michele Loi suggests the deployment of X-AI (explainable artificial intelligence) and an approach called “counterfactual explanation”. These are good and wonderful ideas, one of them a rather old idea, but when it comes to implementation they are more hype than reality. The proposal to “force” a machine learning method “to produce a mathematical model to which epidemiologists can attach a clear meaning” sounds nice, except at a very early stage you have to make a choice between letting the machine learning process work or letting the epidemiologists work. The two cannot possibly work along each other. If so, it would not be machine learning.

    And let us not forget that health data can always be traced back to the specific person they are derived from, no matter what kind of “anonymization” has been conducted. So the statement “We must save privacy from privacy itself” is nonsense. And it covers the present threat against our common condition – that without privacy there can be no democracy.

    ——

    Klavs Birkholm is founder and director of the Danish think-tank TeknoEtik (teknoetik.org). He was also for eight years a member of the Danish National Council of Ethics.

    1. From the original article:

      – on modelling risk for social types: “While it is important to learn about the social causes of the spread of the virus, we must prevent the harmful consequences of stigmatization on specific groups of people, especially those who already suffer more than others because the coronavirus. We cannot give up our humanity and sacrifice the dignity of entire groups of people for the sake of generating a marginal increase in the improvement of the mechanics of fighting the disease. This means that the data scientists developing the models must constantly ask themselves questions about the variables they define (e.g. should I describe a social type in this or that way) and how they communicate their findings. […] We should use equally powerful methods to identify the drivers or stigmatization and those conditions (e.g. hypothetically, being an immigrant) increasing the risk of social exclusion. ”

      – on counterfactual explanations: “The problem with such methods of explanation is that, as the recent literature shows, they can be manipulated at will by the people writing them, and even by the people developing the algorithm to be explained. […] We also need further control mechanisms if we want to ensure that this type of X-AI will not be used to fool the public and that the explanation provided with such methods should be trusted. This is best achieved by giving machine learning researchers as little incentives to fool the public as possible.”

      – on interpretable machine learning models: https://www.nature.com/articles/s42256-019-0048-x (N.B. this is still machine learning)

      – on the data diggers: this is the protocol that Google and Apple will probably enable. Please comment on it if you find serious flaws; it was through such process of criticism and revision that the current version was generated): https://github.com/DP-3T/documents/issues/

      The balancing of privacy and health is a challenging task. You cannot give up privacy, but you cannot also give up improving the collection of data about the disease with all the means at our disposal. This will be one of our main weapons in the fight against the virus and we will be judged by future generation for how good we have been at rescuing lives, not just for how good we have been at protecting data privacy. Reasonable people expect ethics experts to find reasonable compromises. They also expect civil society organizations, I believe, to contribute their inputs to the debate to assessing when a compromise is reasonable, and to denounce one that is not. This is not a moment in which we can afford to describe the world in black and white.

      I’ve started to draft an ethical framework for the collection of data through bottom up citizen science initiatives. Please consider giving your contribution by writing comments to the draft:

      https://docs.google.com/document/d/19F_hXIlPvDKCk8JTfxNXOuex0r_GpmPEUuoSUMEj-zY/edit?usp=sharing

      Thank you so much for your critical attention.

  2. Fully agree on “Such modelling however, would be ethically improper, even reprehensible. Moreover, it would not help a bit in the fight against virus-spreading.” – #causality is not provided by applying models. We just infer something, which might be sufficient in many cases.

    Our societies must step back from the current situation to understand, why we distinguish between public and private. The circumstances i.e. context is relevant. There are numerous examples of whistleblowing that shouls remind us of the nee dfor #transparancy and expression of concerns that are backed by private data. We should always consider this in a controlled environment i.e. supervision of proper conduct.

  3. @Thomas Teske; We should not forget, that “the gold diggers” are not only authorities of the government alone. There are a lot of commercial organizations like worlwide active enterprises (say Amazon, Microsoft, Apple, Zoom, facebook, WhatsApp and big Providers and so on) which get now personal and private data in an amount of 1000 times more than ever before. These “data lakes” will threat privacy (or even intimacy) of people in the future in a unseen matter!

    The aim are not the official authorities alone; we should not forget the private and commercial sectors.

Leave a Reply

Your email address will not be published. Required fields are marked *

AlgorithmWatch is supported by