A new study has revealed that the large language model ChatGPT Plus achieved better diagnostic accuracy than physicians working alone.
The research, completed by academics at UVA Health, included 50 physicians in family, internal, and emergency medicine who put ChatGPT Plus to the test.
Half of the participants were randomly assigned to use the tool to diagnose complex cases, while the other half relied solely on conventional methods like medical reference sites and Google.
With many hospitals already using AI for patient care, a new study found that using Chat GPT Plus does not significantly improve doctors’ diagnoses. #MedX 🔎 https://t.co/YRzZcrywv3 pic.twitter.com/m0AfoDBXhb
— UVA Health (@uvahealthnews) November 13, 2024
The resulting diagnoses were then compared between the two groups, with similar accuracy. But it was when ChatGPT performed alone that it outperformed both groups.
“Our study shows that AI alone can be an effective and powerful tool for diagnosis,” said Andrew S. Parsons, who oversees the teaching of clinical skills to medical students at the University of Virginia School of Medicine and co-leads the Clinical Reasoning Research Collaborative.
“We were surprised to find that adding a human physician to the mix actually reduced diagnostic accuracy though improved efficiency. These results likely mean that we need formal training in how best to use AI.”
ChatGPT Plus performs well in tests, with median diagnostic accuracy of more than 92%
The researchers then went one step further, launching a randomized controlled trial across three hospitals. This included UVA Health, Stanford and Harvard’s Beth Israel Deaconess Medical Center.
The participants made diagnoses for ‘clinical vignettes’ which were based on real-life patient-care cases. The case studies included details about the patient’s history, physical exams, and lab test results.
The researchers then scored the results and examined the speed at which the two groups made their decisions.
In a press release, UVA Health explained the findings: “The median diagnostic accuracy for the docs using Chat GPT Plus was 76.3%, while the results for the physicians using conventional approaches was 73.7%.
“The Chat GPT group members reached their diagnoses slightly more quickly overall – 519 seconds compared with 565 seconds.
“The researchers were surprised at how well Chat GPT Plus alone performed, with a median diagnostic accuracy of more than 92%. They say this may reflect the prompts used in the study, suggesting that physicians likely will benefit from training on how to use prompts effectively.”
The researchers, however, have cautioned that the chatbot would likely fare less well in real life when other aspects of clinical reasoning come into play.
“As AI becomes more embedded in healthcare, it’s essential to understand how we can leverage these tools to improve patient care and the physician experience,” Parsons said.
“This study suggests there is much work to be done in terms of optimizing our partnership with AI in the clinical environment.”
Featured Image: AI-generated via Ideogram