Oric Lawson | Getty Images
When it comes to artificial intelligence, hype, hope, and omens are suddenly everywhere. But disruptive technology has long been causing waves in healthcare: from IBM Watson’s failed attempt at healthcare (and the long-held hope that AI tools might one day beat doctors at detecting cancer in medical images) to problems investigated from the algorithm’s racial biases.
But, behind the general squabbling over the fanfare and failure, there is a messy reality of bootstrapping that goes largely untold. For years, healthcare systems and hospitals have faced ineffective and, in some cases, doomed attempts to adopt AI tools, according to a new study led by researchers at Duke University. The study, which was published online as advance print, pulls back the curtain on these messy applications while also mining the lessons learned. Amid eye-opening discoveries from 89 professionals who participated in rollouts at 11 health care organizations—including Duke Health, Mayo Clinic, and Kaiser Permanente—the authors put together a practical framework that health systems can follow as they try to roll out new AI tools.
New AI tools continue to emerge. Just last week, a study in JAMA Internal Medicine found that ChatGPT (version 3.5) decisively outperformed physicians in providing high-quality, empathetic answers to medical questions posted by people on the r/AskDocs subreddit. The superior responses—as judged in person by a panel of three physicians with relevant medical expertise—suggest that an AI chatbot like ChatGPT could one day help physicians address the growing burden of responding to medical messages sent through online patient portals.
This is no small feat. Higher patient letters are associated with higher rates of physician burnout. According to the study authors, an effective AI chat tool could not only reduce this burdensome burden — providing relief to clinicians and freeing them to direct their efforts elsewhere — but it could also reduce unnecessary office visits, enhance patient adherence and compliance with medical guidelines, and improve health outcomes. for patients in general. Furthermore, better message response could improve patient equity by providing more online support for patients who are less likely to make appointments, such as those with mobility issues, work limitations, or medical billing concerns.
Indeed, artificial intelligence
It all sounds great — like so many AI tools for healthcare promise. But there are some significant limitations and caveats of the study that make the true potential of this app harder than it seems. For starters, the kinds of questions people ask on a Reddit forum don’t necessarily represent the questions they would ask a doctor they know and (hopefully) trust. And the quality and types of answers that volunteer doctors give to random people on the Internet may not match those they give to their patients, with whom they have an established relationship.
But, even if the study’s primary findings hinge on real doctor-patient interactions through real patient portal messaging systems, there are many more steps that need to be taken before a chatbot reaches its lofty goals, as revealed by Duke. Preprint study.
To save time, the AI tool should be well integrated into the clinical applications of the health system and the specific workflow of each physician. Doctors will likely need reliable technical support around the clock in the event of a malfunction. Clinicians will need to strike a balance of trust in the tool—one so that they don’t blindly pass AI-generated responses to patients without review but know that they won’t need to spend so much time adjusting responses that the tool cancels out.
And after managing all of that, the health system will have to create an evidence base that the tool is working as hoped in their own health system. This means that they will have to develop systems and metrics for tracking outcomes, such as managing physicians’ time, patient justice, adherence, and health outcomes.
These are heavy questions in an already complex and overwhelmed health system. As the researchers note in a preprint draft of their introduction:
Relying on the Swiss cheese model of epidemic defense, every layer of the healthcare AI system currently has large holes that make mass proliferation of underperforming products inevitable.
The study outlined an eight-point framework based on action steps when making decisions, whether that be from an executive, IT leader, or frontline clinician. The process includes: 1) Identifying and prioritizing the problem. 2) identify how AI can help; 3) develop methods for evaluating the results and successes of artificial intelligence; 4) know how to integrate it into existing work streams; 5) validate the safety, effectiveness, and fairness of AI in the healthcare system prior to clinical use; 6) Roll out the AI tool through networking, training and trust building; 7) Observation. and 8) update the tool or turn it off over time.
‘A constant challenge’
Hospital systems experienced difficulties at each of these steps, according to interview responses from 89 interviewed professionals and physicians, whose identities were anonymized in the study.
This even includes the first few steps of identifying problems where AI can help. “At the moment, a lot of AI solutions are basically trying to do the same thing a doctor does. So, it’s like, take an x-ray, read it like a radiologist would. But we already have radiologists, so what’s that doing?” he said. An anonymous major source of AI adoption.
Evaluating the effectiveness of an AI tool and whether it is suitable for a particular problem has also been a common struggle. “I don’t think we even have a good understanding of how the algorithm’s performance is measured, let alone its performance across different races and ethnic groups,” said another source.
But getting the algorithm right is just part of the challenge. Getting him to work with doctors is another. Even relatively simple tools, such as AI-based methods for automatically completing triage notes in emergency departments, have failed in practice, according to those interviewed. “When I first heard about it, I thought, ‘This is a no-brainer,’ like clinicians who are going to love AutoComplete as you understand it,” said one interviewee. But, “It wasn’t as popular as you’d expect. And it’s not because the algorithm is wrong. The algorithm works well. But it just doesn’t fit into the workflow.”
Anonymous sources said the technical build-up in a doctor’s workflow must be coupled with trust and understanding — and the right amount of being right. As one interviewee explained, this makes assimilation difficult:
If we had a situation where basically a device would fit all the time, doctors would trust it and stop focusing on it. If we had a system where the system was wrong, doctors would not use it often. On the other hand, if we have a system, where the system is faulty enough that doctors have to check it with a decent amount, and they find they fix it with a decent amount, it’s in that sweet spot. It’s hard for me to imagine staying there being so sweet, or, frankly, such a good use of the doctor’s time.
In many cases, AI tools have fallen out of the way amid staff turnover and physicians’ reluctance to learn new tools while they are barely keeping up with the work they already have. “This is an ongoing challenge,” said one IT source.
And when clinicians embrace new tools, measuring and monitoring outcomes becomes a struggle. “I think most health systems are pretty stupid to find out how well this works in individual patient cases,” said one of the key anonymous specialists who focus on regulation. “And that’s part of the reason there’s nothing close to a healthy learning system, because we’re not very good at monitoring results, except in a few strange, unusual cases.”
Altogether, the interview responses suggest that to truly harness the potential of AI in healthcare, health systems may need to create “new teams to interact with or monitor the system, new communication strategies for maintaining professional boundaries, and new expertise.”