(When) Will AI Improve Women’s Health? 

Across Europe, AI solutions are increasingly being researched and piloted for diagnosing and treating gynecological diseases. While they are still only slowly adopted in healthcare, they are increasingly pushed to become more common.

Story

27 August 2025

#ai #health #women

Beautiful purple woman surrounded by nature illustration

 “Take some paracetamol.” That’s the advice a woman might receive from an AI triage system when she seeks help for excruciating period pain, says Jessica Morley, a postdoctoral researcher at the Yale Digital Ethics Center.

These systems are designed to streamline access to general practitioners, but when symptoms are not described in exactly the right way, they can become barriers. “Unless she hits the algorithm’s red-flag triggers,” Morley warns, “she may never make it past the gate.”

This is happening across Europe and beyond.

The pain women experience can be due to a number of gynecological diseases, including endometriosis, a chronic condition in which uterine-like tissue grows outside the uterus. It affects one in ten women of reproductive age and takes, on average, 7 to 10 years to diagnose.  

Yet current AI triage systems potentially contribute to an already lengthy diagnostic process becoming even longer. 

“A doctor might be able to see extreme pain in a patient’s face, even if she doesn’t articulate it clearly. But a triage algorithm doesn’t have that capacity,” Morley stresses. If AI triage systems consistently dismiss her pain as "normal period discomfort," she might not even be able to get to a general practitioner in the first place.

AI models learn from historical clinical records and triage notes that contain implicit gender-based biases in judgments made by healthcare professionals under pressure. As a result, AI-driven triage may underestimate the severity of (gynecological) conditions in women, potentially delaying appropriate care − as a recent study from the Bordeaux University Hospital confirmed.

Dr Raheleh Kafieh, an assistant professor of bioengineering at Durham University, warns that “AI tools are only as reliable as the data or the context that they are being given. If limited, biased, or poorly annotated datasets are used, the algorithms fall into blind spots,” she adds.

Limited Data

The scale of this problem becomes clear when examining the AI tools for female-specific diseases already hitting the market.

Take Ziwig Endotest, a French diagnostic tool that uses AI-powered analysis of saliva-based microRNA to detect endometriosis. The company claims high sensitivity and specificity, and the test is already available in multiple European countries. 

Yet one of Ziwig’s key studies, published in 2023, included only 200 women aged 18 to 43 − all of them formally diagnosed with endometriosis or suspected to have it, all required to speak “good French” and be affiliated with the French health system. 

For a tool being commercialized across the EU, the sample size appears not only small but remarkably specific to French demographics and healthcare structures. No large-scale, peer-reviewed follow-up studies seem to have been published, and Ziwig did not respond to several interview requests. 

“Until a thorough clinical trial has been completed with the results being published in a peer review venue, I think it is too early to say whether Ziwig's Endotest will be safe, accurate, ethical, and useful to people who might be suffering from endometriosis,” stresses Bianca Schor, an AI researcher in women’s health at Amsterdam UMC.

A 2024 study on AI for diagnosing endometriosis confirms such cautious expectations. Researchers found AI systems with 100 features being tested on sample sizes of less than 100 cases − while at least 1,000 are typically required for reliable results.

Such small datasets lead to overfitting, which means that the AI learns patterns specific to a limited group and not valid if applied to more diverse real-world cases, resulting in unreliable predictions, especially if patients’ characteristics such as age, ethnicity, geography, or symptom presentation differ. 

The limited sample size is not the only issue. Key variables are often entirely missing in current AI tool development for gynecological uses. Hormonal life stages, which have “huge effects on the body and feelings,” are largely ignored, notes Dr Raheleh Kafieh. “Algorithms for pre-menopausal women should be completely different from those for post-menopausal ones.”

Even available data is often fragmented and blanks out the context needed to build robust, generalizable models. “Sometimes we get ultrasound data, but we don’t have enough clinical data associated with the image, which makes it hard to train the model,” explains Dr. Annalisa Occhipinti, associate professor at Teesside University. 

“You don’t know if that specific data type was associated with one condition or multiple conditions, or what the age of the patient was.” Her point highlights a main barrier for developing an effective tool: Poor data integration and documentation make it not only difficult to train reliable algorithms, but also to understand what the AI systems are actually learning.

Building collaborative algorithms

Morley warns of potential “automation bias” in gynecological care: Clinicians might end up over-relying on AI tools, assuming them to be objective. “The risk is especially high right now, as there is a lot of excitement around AI tools, and limited knowledge, skills, and experience in using them,” she says. 

Some initiatives, however, seek to close scientific gaps in algorithm development. As part of her academic role, Kafieh is actively involved in research projects. Although her expertise is not specifically in gynecology, her AI-driven diagnostic work addresses other women’s health conditions, such as multiple sclerosis. In order to develop a more collaborative algorithm, whose instances inform each other whenever they have solved a sub-problem, she involves diverse patient groups across age, race, and other factors.

“At the beginning, we ask patients what kind of algorithm would actually help them and how it might support diagnosis,” Kafieh explains. “Their input shapes the project’s direction.” 

At this point, she notes, “women sometimes ask whether hormonal changes are factored into the algorithm. The honest answer is, not yet.” 

Funding remains a major obstacle. According to Kafieh, financial support for this research is limited in time and still directed towards diseases affecting men or both women and men, and not women in particular.

When researchers have to recruit their own patients for studies, the lack of funding availability further explains why many trials end up using small datasets, adds Occhipinti. “Recruiting and tracking patients over time takes a lot of resources.” 

AI has the potential to transform women’s healthcare by reducing waiting time for diagnoses or spotting patterns doctors might miss. But we are not anywhere near unlocking this potential.

Raluca Besliu

Former Algorithmic Accountability Reporting Fellow (2024-2025)

Picture of Raluca Besliu

Raluca Besliu is an independent Romanian journalist who lived and worked in West Africa, Germany, and the United States. She has published more than 600 articles on topics ranging from environmental and political affairs in Eastern Europe to human rights abuses in African countries, and has been featured in esteemed publications, such as The New York Times and Euronews. In 2024, she worked on an investigation into the use of synthetic media ahead of the European Parliament elections. Published in a two-part series in Il Manifesto, the research revealed the psychological and social effects of deepfakes, particularly on female politicians. She also contributed to fostering AI-related knowledge exchange and skills development within the journalistic community of the Bosch Alumni Network (BAN), a community of social changemakers supported by the Bosch Foundation.