ChatGPT Health delays care in over 50% of emergency-level cases, finds study

A new independent study has raised serious concerns about the reliability of artificial intelligence tools in healthcare, warning that OpenAI’s ChatGPT Health may fail to recognise life-threatening medical situations.

The research found that the AI-powered health assistant, which allows users in the United States to connect their medical records and receive medical guidance, incorrectly delayed urgent care recommendations in more than half of simulated emergency cases.

ChatGPT Health, launched in January 2026, is reportedly used by around 40 million adults in the United States every day for health-related advice. But the findings suggest that while the tool can identify clear-cut emergencies, it may struggle with more complex scenarios that require clinical judgement.

Study raises safety concerns.

The safety evaluation, published in the journal Nature Medicine, examined how the AI system responded to a range of medical scenarios. Researchers from the Icahn School of Medicine at Mount Sinai created 60 simulated patient cases that ranged from mild illnesses to critical emergencies.

Each scenario was reviewed by three independent doctors using established clinical guidelines to determine the appropriate level of care. The research team then generated nearly 1,000 responses from ChatGPT Health under varying conditions, including changes in patient gender, the addition of laboratory results, and input from family members.

The findings were troubling. In 52 per cent of cases classified as medical emergencies by doctors, the AI tool recommended less urgent care than required. In several cases involving serious conditions such as diabetic ketoacidosis or impending respiratory failure, the system suggested patients seek evaluation within 24 to 48 hours rather than immediately visiting an emergency department.

In one simulated scenario, a woman experiencing suffocation symptoms was repeatedly advised to schedule a future medical appointment. Researchers found that in eight out of ten attempts, the system failed to recommend immediate emergency care despite the severity of the situation.

Dr Ashwin Ramaswamy, lead author of the study, said the tool performed relatively well in recognising textbook medical emergencies such as strokes or severe allergic reactions. However, it struggled when symptoms were more subtle or complex.

“ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions,” Ramaswamy said.

“But it struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgement matters most.”

Inconsistent responses and false alarms

The study also highlighted other concerning patterns in the AI system’s responses. In lower-risk scenarios, ChatGPT Health often reacted too aggressively, recommending urgent medical care for situations that did not require immediate attention.

Researchers found that 64.8 per cent of individuals classified as safe were incorrectly advised to seek emergency medical assistance.

The system’s handling of mental health situations also raised questions. ChatGPT Health was designed to direct users to suicide crisis support when high-risk situations are detected. However, the study found that these alerts were sometimes triggered in lower-risk cases while failing to appear when users described detailed plans to harm themselves.

According to the researchers, this inversion of risk signals could create dangerous situations in real-world use.

The research also examined how external influences affected the AI’s recommendations. When family members or friends downplayed a patient’s symptoms within the simulation, the system frequently downgraded the urgency of care, suggesting less immediate medical attention.

Health experts say these inconsistencies highlight the potential dangers of relying too heavily on automated health tools.

Despite the findings, researchers stressed that AI health tools should not necessarily be abandoned. Instead, they argue that users and healthcare professionals must learn how to interpret AI-generated advice cautiously.

Alvira Tyagi, a medical student and co-author of the study, said understanding the limitations of such systems is becoming increasingly important as AI becomes more integrated into healthcare.

“These systems are changing quickly, so part of our training now must include learning how to evaluate their outputs critically, identify where they fall short, and use them in ways that protect patients,” she said.

OpenAI, responding to the study, said the research does not accurately reflect how people typically use ChatGPT Health or how the system is designed to function in real-world healthcare situations.

Still, the findings add to the growing debate over the role of artificial intelligence in medicine and the potential risks of relying on AI-driven advice in urgent medical situations.

Share your love
Facebook
Twitter
LinkedIn
WhatsApp

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

error: Unauthorized Content Copy Is Not Allowed