Between care and control: 200 years of health data in France

The French “Health Data Hub” will soon offer health data on all French citizens to AI startups that request it. It is the latest step in a project to centralize health information that began 200 years ago and kept oscillating between care and control, but mostly control.

Coming back from the 1876 World Hygiene Congress, French doctor Olivier du Mesnil was in awe of Brussels' health information system. There, he wrote, doctors immediately sent a copy of death certificates to the central hygiene bureau. Every week, bureau personnel produced a map of the city where each death was marked with a pin, with different colors for each contagious disease, and showed it to the mayor. This technology, Mr du Mesnil said, allowed even “the person least familiar with scientific research to exert non-stop control over the health status of the population.”i

Mr du Mesnil used a Belgian example to underline how much France had to catch up. Like its neighbors to the north, France was a rapidly industrializing country in the 19th century. Railroads sped up transport and factories brought together large numbers of workers packed in unhealthy, precarious dwellings.

Data collection, 1930

Controlling cholera

The main beneficiaries of this combination were bacteria, which could jump from host to host rapidly over wide distances. Two cholera epidemics, in 1832 and 1854, claimed over 100,000 lives each.ii The bacteria responsible for the illness was not identified until 1884, but governments across Europe understood the need to collect data to follow the spread of cholera and order quarantines in time to stop its progression. After the first cholera epidemic, the French government began gathering information throughout the country on what was thought to be the cause of the sickness: bad air, rotten food and even “ignorance”.iii

Despite its lethality, cholera was far from the main killer of the 19th century. Dysentery and tuberculosis were an order of magnitude more dangerous but were limited to the poorest reaches of society.iv That the government only took action regarding cholera – which killed in all social classes – was taken by many as proof that the health of the poor was only a concern when it impacted the rich. Distrust, relating to government measures to fight cholera, including data gathering, ran high.v

Health police

Until well into the 20th century, health was a matter of public order rather than well-being. Until the first world war, health was the purview of the ministry of the interior. To monitor and predict the spread of tuberculosis in large cities, authorities built a “health record” for each building, modeled on the criminal record of each individual. Workers would need to present both to obtain a job. (The health record was discontinued around 1908 after hospital personnel pointed out that building-level data was not much use.)vi

A change of perspective occurred in the first decades of the 20th century. Firstly, warfare required to conscript not just part of an age cohort, but all of it. The general health level of the population acquired a new military importance. Secondly, eugenics, a pseudo-scientific craft that claimed to improve a population by rooting out its unhealthy members, gained in popularity. Health and hygiene became political goals in themselves and gained their own ministry in 1920.vii

Knowledge monopoly

Health statistics, once concerned only with epidemics and controlling the poor, started to record well-being. The League of Nations, a newly created and optimistic international organization, led the movement and commissioned health surveys across countries.

Not all French doctors were enthusiastic about the change.viii They complained that such data collection would endanger doctor-patient confidentiality, but their main concern may well have been a loss of status. At the time, doctors were the ultimate repository of medical knowledge. Passing on information to the state was seen as a devolution of power. Because doctors were almost entirely financed by their patients, they had little incentive to cooperate in systems they disliked.

In any case, the collection of health data for the well-being of the population was only limited to a fraction of French taxpayers. In the colonies, health was still seen as a production factor, to be optimized inasmuch as it made plantations and mines more profitable. Until 1946, all French colonies together employed at most four statisticians, for whom health data was probably not a priority.ix

Knitting needles and data processing

Some in the medical sciences saw an opportunity in structuring data. In 1929, Louis Bazy, a surgeon consulting for the Paris-Orléans railway company, had the idea to use his employer’s “statistics machines” to aggregate data on the health of the company’s workforce. He designed a system where each employee’s illness was coded in a punch card, along with her or his personal data. The statistics machine could process 400 cards a minute and, with the help of the "tabulating machine," provide information on the spread of a disease or correlate between variables. The scope of applications for medicine and research was endless, he wrote.x

Not every doctor had access to statistics machines, so that a professional magazine for progressive doctors explained how to process punch cards with knitting needles.xi Despite such efforts, there is no evidence that these early attempts at computerized medicine gained many followers.

Certainly not GDPR-compliant: Mr Bazy’s structuration of health data on punch cards

The drive towards centralization

During the second world war, the French government made big strives to implement eugenics. A side effect of this policy was the creation of a National hygiene institute (Institut national d’hygiène, INH). From 1942, it conducted large-scale data collection to track the effects of the government’s crackdown on alcoholism and venereal diseases. It also built a central repository of information on 35,000 cancer patients.xii

After the war, INH expanded and kept monitoring the nation’s health (it became the French National Institute of Health and Medical Research, Inserm, in 1964). At the same time, the post-war government offered social insurance to all its citizens. With it came a social security number given at birth, which remains immutable until death. Having a unique identifier for every citizen revived the old dream of governance through numbers, where rational decisions could be taken based purely on data.

In France as in other countries of the Western bloc, central planning was considered a necessity. The government felt it had to collect comprehensive data on morbidity (that is on the illnesses affecting the population). A first attempt in 1945 to force hospital doctors to fill out forms after each procedure, to be sent to a central authority, failed. Another attempt was made in 1958, and another in 1972. As in the 1930s, doctors did not comply with their new obligations. They criticized the methodology, complained about the added workload, and said they failed to see any benefit for them.xiii


This changed in the 1980s. A new attempt at centralizing morbidity data started in 1982. By the beginning of the next decade, all hospitals were feeding data to a central authority.

This success – for the state – might have to do with the economic environment of that decade. The slow economic growth provided an incentive for the government to cut costs in healthcare. Despite initial reluctance from doctors, several health ministers pushed through and made the new system mandatory in 1991.xiv

The data gathering effort was first and foremost a cost control mechanism. Knowing how many procedures each hospital carried out, and how much money each hospital received, the health ministry could rank their performance, in financial terms at least. Hospitals that overspent were called to order until, in the 2000s, the government introduced payment by procedure. A heart attack of level 1 is worth €1,205.57, increased to €1,346.85 if the patient dies within two days.xv Each procedure that doctors perform is coded according to strict classification, and hospitals are paid by the social security system accordingly.

To navigate the list of over 6,000 procedures, hospitals hire external consultants to “optimize” their coding practices. As AlgorithmWatch reported in May 2019, code optimization is nothing less than “generalized cheating” to maximize revenue, according to a health sociologist at the University of Lille.

Quality concerns

Because France has a mandatory national health insurance system with a single paying authority, the morbidity data can be linked to medication usage as soon as a drug is reimbursed by social security. For about 99% of the population, the French national health insurer has comprehensive information on hospital procedures and drug intake since the early 1990s.xvi

This unique data set allowed researchers to find hidden correlations. This is how Benfluorex (sold under the name Mediator) was linked to heart valve disease, leading to the withdrawal of the drug in 2009.

However, all the information on hospital procedures is of accounting nature and not a medical one. The optimization of procedure encoding does a great disservice to data quality, but no one knows exactly how bad the situation is, as very few studies have been conducted. One such study, from 2011, showed that, on one specific procedure, the wrong code was used eight times out of ten.xvii


Despite this abysmal performance, in 2019 the French government pressed to build an even bigger database, called the “Health Data Hub”. Cédric Villani, a mathematician who spearheaded the Artificial Intelligence strategy of president Emmanuel Macron, wrote in a parliamentary report that the real risk of AI in health would be “not to welcome it”.xviii The Hub aims at providing any health-related data to AI projects that request it.

Since 2004, the government has pushed for all French residents to open an electronic health record (EHR). After a slow start, the centralized EHR will be made opt-out in 2021, and should, in time, be connected to the Health Data Hub.xix

The French data protection authority criticized the project because of its broad aims. Data from the Hub can be used for any “public interest” goal, opening the door to any commercial application. Critics also pointed out that personal data in the Hub is pseudonymized but not aggregated so that it can be easily de-anonymized.xx

Toxic relationships

A doctor who wished to be identified only as Gilles started a “data strike” when the Health Data Hub was officially launched in December 2019. He and others called on colleagues to stop filling out the forms that feed the Hub. Since the 1980s, he said, France moved from “a healthcare that cures to a healthcare that counts,” pointing to the cost management systems. He saw no benefits in the new database, saying that research does not need it. “Software only robs time that should be spent on caring for patients,” he added.

Even if he refused to give any numbers relating to the success of the strike, Gilles’ anger is widely shared. In January 2020, over 1000 doctors resigned their administrative duties, citing the pay-per-procedure system as one of the largest problems.xxi

It was also revealed that the designer of the Health Data Hub quit his job to work for a private sector firm specialized in selling health data. However, he saw no conflict of interests.xxii

Health data shrug

The main breakthrough of the Health Data Hub is that, for the first time, a French government used an English name for an official project.xxiii The rationale that led to its creation is a direct continuation of 200 years of efforts by the French government to gather data on its citizens, to make its population more legible and more governable.

No one knows what the Health Data Hub will bring, but history offers some insights. The information system that Brussels set up in the 1870s, which Mr du Mesnil so admired, might have worked. The city was spared any large epidemic until the Spanish flu of 1918. But then again, so were all large cities in France. On the other hand, life expectancy in Brussels, relative to the Belgian countryside and to other large cities, decreased between 1885 and 1910.xxiv

It could be health data and actual health do not always go hand in hand.

Nicolas Kayser-Bril


Nicolas is data journalist and working for AlgorithmWatch as a reporter. He pioneered new forms of journalism in France and in Europe and is one of the leading experts on data journalism. He regularly speaks at international conferences, teaches journalism in French journalism schools and gives training sessions in newsrooms. A self-taught journalist and developer (and a graduate in Economics), he started by doing small interactive, data-driven applications for Le Monde in Paris in 2009. He then built the data journalism team at OWNI in 2010 before co-founding and managed Journalism++ from 2011 to 2017. Nicolas is also one of the main contributors to the Datajournalism Handbook, the reference book for the popularization of data journalism worldwide.