Article

Case Study: Can AI Reduce Malpractice Claims in Emergency Medicine?

By: William S. Kanich, MD, JD

06/24

Emergency medicine is a challenging and rewarding specialty, but it also carries a significant risk of a malpractice claim. During the past five years, medical malpractice cases filed against emergency medicine physicians have increased by more than one-third, a trend opposite that of most specialties, which are seeing malpractice cases decrease in number. According to a study in the National Institutes of Health library, approximately 75% of emergency physicians will be named in a malpractice suit during their careers.

The most common allegations against emergency physicians are diagnosis-related — not surprising given the high stakes, wide range of conditions and limited information that characterize emergency medicine. The uncertainty and complexity of the clinical environment can cause errors in thinking, premature closure of the differential diagnosis and inappropriate anchoring on an erroneous diagnosis.

To address those challenges, MagMutual conducted an informal study to evaluate the impact of ChatGPT — an artificial intelligence (AI) tool — on the diagnostic accuracy and malpractice risk of emergency physicians. Though our study had limitations, including a small sample size, retrospective design and single-observer interpretation, we found that AI may be an effective tool to improve diagnostic accuracy and potentially reduce malpractice claims resulting from emergency treatment.

Please note: ChatGPT isn’t a perfect system and, along with its own inherent flaws and biases, isn’t designed specifically for medical diagnosis, as some AI applications are. Nor is it HIPAA-compliant, and the completeness and accuracy of its results aren’t guaranteed. Our study was designed only to experiment with a readily available AI tool, not as an attempt to endorse its use in clinical settings.

Testing AI’s Effectiveness

ChatGPT is a natural-language processing system designed to understand and generate human-like text based on the input it receives. It’s trained on a diverse range of internet text to develop responses to a wide variety of prompts and questions. While ChatGPT represents a significant improvement over older, similar AI applications, it also has drawbacks, including:

Inability to learn from interactions
Inability to understand and respond to emotions
Potentially biased or inappropriate outputs

Nevertheless, because ChatGPT is commonly available and widely used, we decided to test it against emergency medicine claims to evaluate its responses. We used a de-identified retrospective cohort design (which looks back at historical data after we remove all identifying protected health information) to analyze closed malpractice claims from a national database at MagMutual. We used the following criteria to select claims:

Closed between 2020 and 2023
“Delay” or “failure to diagnose” was cited as the allegation

We excluded claims that involved non-emergency physicians or other allegations. We then entered two key data points from each claim into ChatGPT:

Emergency Medical System (EMS) notes, nursing triage notes, vital signs and the history and physical exam of the defendant provider
Diagnostic studies that were recorded in the encounter

For each data point, we prompted ChatGPT to provide a preliminary differential diagnosis, a most likely diagnosis and recommended diagnostic studies. ChatGPT also determined remaining differential considerations, a likely diagnosis and recommended disposition (admit or discharge) after the second data point.

We compared ChatGPT responses to the actions of the emergency physicians and outcomes for plaintiffs to determine whether AI could have helped the physicians avoid diagnostic errors and malpractice claims. The standard of care (SOC) evidenced by the emergency physicians was also evaluated as an independent variable in judging the effectiveness of using ChatGPT to avoid a claim.¹ 

Results & Projections

In the end, we determined that ChatGPT provided accurate and timely diagnoses and recommendations for slightly more than half the claims we examined (52%). Of those claims, two-thirds involved poor SOC and one-third involved good SOC. ChatGPT also suggested appropriate diagnostic studies and dispositions in most cases.

However, it did not appear that ChatGPT suggestions could have prevented the claims in 48% of the cases. Either the diagnosis was too rare or complex, the information available was insufficient or misleading, or the physician missed something critical on the physical exam. 

The study suggests that AI could be a promising tool that can augment diagnostic skills and potentially reduce the risk of misdiagnosis by emergency physicians. Further, it may provide a second opinion, a reminder or a confirmation for emergency physicians, especially in cases of uncertainty or difficulty, supporting and enhancing their practice. AI also may help a physician document their clinical reasoning and decision-making, which can improve their communication and defensibility.

What AI can’t do is replace the clinical judgment or responsibilities of physicians. We noted errors that could have been made if an emergency physician followed AI blindly (for example, recommending diagnostic studies to evaluate for giant cell arteritis in someone who was younger than the typical age of risk for the condition). Even AI systems that have been designed to enhance diagnostics can suffer from limitations such as low specificity and high false positive rates, leading to increased and unnecessary testing and affecting their performance and reliability.

Final Recommendations

Considering both the limitations of ChatGPT in general and our study in particular, we are not suggesting that physicians adopt it as a tool for clinical diagnosis. However, we do believe that appropriate use of AI in diagnosis and other medical settings could benefit both providers and administrators, particularly as the technology is developed for use in healthcare. Ultimately, all physicians — emergency and otherwise — should use any artificial intelligence tool with caution and discretion, always verifying the information and the sources provided by the system.

Bill Kanich is a board-certified emergency medicine physician who practiced in Charleston, South Carolina, before joining MagMutual in 2017. He serves as chair of the MagMutual Board of Directors.

Disclaimer: The information provided by ChatGPT is intended for general informational purposes only. While efforts are made to ensure the accuracy and reliability of the information presented, neither ChatGPT nor MagMutual can guarantee its completeness, suitability or validity for any particular purpose. Users are advised to verify the information obtained from ChatGPT with other credible sources and to exercise their own judgment when applying it to specific situations.

1 Standard of Care rankings are determined by an independent analysis from peer physicians on our Medical Faculty panel.

Want access to exclusive content and advice?

MagMutual offers a wide variety of industry-leading tools and advice to help healthcare providers mitigate their liability risk. Become a PolicyOwner today.

Get a Quote MyMagMutual

Learn more about our products

Disclaimer

The information provided in this resource does not constitute legal, medical or any other professional advice, nor does it establish a standard of care. This resource has been created as an aid to you in your practice. The ultimate decision on how to use the information provided rests solely with you, the PolicyOwner.

LEARNING CENTER / CMEs

MagMutual’s Learning Center has moved. Policyholders can access their CME credits in the new MyMagMutual experience.

MyMagMutual