story

Bad data and health: Garbage in, carnage out

by Nicolas Kayser-Bril

French investigative outlet Mediacités recently revealed how design failures in the user interface of software used in the country’s second-largest hospital endangered patients’ lives. Such software is used to gather the data that feeds machine learning algorithms, raising questions on the efficiency of artificial intelligence solutions in the health sector and what they mean for patients’ health.

 

The director of the public hospital of Hyères, a city of 60,000 in the south of France, recently complained about design failures of the hospital’s main information-management software, called “Easily”, Mediacités reported. The software’s front-facing interface is so poorly designed, he wrote, that the names of certain drugs are truncated, making it hard for nurses to see what exact product the doctor prescribed. The software also contained labelling errors for certain medications, increasing the risk that patients are given the wrong doses. These failures caused at least three serious incidents, he added.

Easily, the software in use at Hyères, runs close to 100 public hospitals in France. When grave usability problems are reported, it is not uncommon that over a year goes by before the issue is fixed.

Poorly programmed software can be fatal. In 2011, a patient died after she was given penicillin, to which she had a strong allergy. The information was on her file, but the hospital’s software did not alert the doctor on call about this, Le Parisien reported (the hospital was later convicted of involuntary manslaughter).

The health sector is a priority of the national AI strategy, in France and in other European countries, as AlgorithmWatch reported in Automating Society. While automated decision-making is currently limited to pilot projects in certain hospitals, the dire state of information management software in France raises doubt regarding the quality of the data machine learning algorithms can build upon.

Eclampsia: Four in five cases were false-positives

The backbone of French medical information is called the national health data system (SNDS in its French acronym). It was set up in 2016 and merges several databases created in the 1990s, including all patients’ histories and all medical procedures performed at hospitals. While SNDS has several goals, the Villani report (p.200 ff.), delivered in 2017 to the government and which acts as a sort of national AI strategy, called for using SNDS as the foundation of a future one-stop-shop that would feed data to artificial intelligence algorithms.

But data in SNDS has many quality issues, and no one knows precisely how many.

Dr. Anne Chantry, a researcher at the French national institute of health and medical research (Inserm), led a study of severe maternal events in four French hospitals based on data from 2006 and 2007. She showed that some conditions were vastly over- or under- reported in the central database. She and her colleagues assessed the quality of the data pipeline by comparing the data points stored in the central management system with the original hospital files. They did not challenge the original diagnosis. Among the 84 cases coded as eclampsia (seizures or coma during pregnancy or shortly after birth), 67 were actually cases of pre-eclampsia, a condition of high-blood pressure (among other symptoms) that might lead to eclampsia. Conversely, postpartum hemorrhage had been correctly coded 129 times, but 75 additional cases had not.

While some of the other conditions that were checked in this study showed lower error rates, it nevertheless demonstrated that the data collected nationally could be severely impacted by false negatives and false positives. The error rates varied significantly among hospitals, too. In a telephone interview, Dr. Chantry pointed out that, while digitization of a hospital’s record keeping was linked with fewer coding errors, the effect held “if and only if” someone with expert knowledge reviewed the data, especially in complex cases.

Worryingly, very little other research has been done to assess the quality of the data in SNDS. A change in the way hospitals are financed is likely making the data quality much worse. In order to break the cronyism that plagued health funding, the French government in the 2000s moved away from lump-sum financing in favor of procedure-based payments. Payments are made based on the data reported to SNDS, leading to a conflict of interest for hospitals, torn between reporting the truth and reporting income-maximizing data.

Dr. Frédéric Pierru, a health sociologist at Lille university, explained in an email interview that hospitals engage in “reporting optimization” – which he also called generalized cheating – sometimes by hiring external contractors that specialize in increasing revenue through creative reporting. The ruthlessness with which such optimization is carried out is such that in 2012, Saint-Malo hospital, in Brittany, fired a medical doctor who opposed such practices on ethical grounds.

Garbage in, garbage out

In the eclampsia research, it took half a dozen scholars several weeks of manual work to double check 396 cases. Half a million new cases enter the system every week, making post-hoc manual correction impossible.

Automation seems ill-fitted a solution to the problem. When Dr. Chantry shared her findings with the hospitals she investigated, they painstakingly contacted all the professionals involved in the data collection process one by one, she said. Hospitals had come to the conclusion that they could not automate any improvement in data quality.

A widely known adage of computer science states that if garbage is input in an algorithm, garbage will be output. AlgorithmWatch contacted the press offices of IBM and Microsoft, who sell artificial intelligence solutions to hospitals in France, to ask if they used SNDS data and if they were aware of its shortcomings. Microsoft, in particular, announced in 2018 that it partnered with the Lyon public hospitals to “dynamize” tools such as Easily in order, for instance, to use AI for diagnosis assistance. Toning down these bold statements, Antoine Denis of Microsoft France said that the company was only providing cloud computing services. IBM declined to comment.

Health professionals in Denmark already pointed to IBM Watson’s training data as the source of many errors, Danish weekly Ingeniøren reported. After a trial at Copenhagen’s Rigshospitalet during which IBM’s automated diagnosis solution failed to impress, doctors mentioned that the software’s reliance on data from New York City’s Memorial Sloan Kettering cancer center, which did not fit Denmark’s reality, might be to blame.

While artificial intelligence is the focus of much attention in the health industry, whether dressed up as a savior or as a bogeyman, the results it yields depend on the quality of the underlying data it uses to feed its algorithms. Good data quality, in turn, has much to do with the user interface of the software health professionals interact with in their everyday tasks. Any sound AI policy must take into account the nitty-gritty of software such as Easily, including the seemingly mundane issue of truncated fields.

 


Photo by Daan Stevens on Unsplash

Published: June 11, 2019
Category: story
Supported by