EUSEM: Diagnostic ability of ChatGPT comparable to emergency department clinicians

The diagnostic ability of the artificial intelligence system ChatGPT is similar to that of emergency department clinicians when examining some complex diagnostic cases, according to the findings of a new study presented at the European Society of Emergency Medicine (EUSEM)‘s recent congress.

Simultaneously published in the Annals of Emergency Medicine, the study used data from 30 undiagnosed patients who were ultimately given a single proven diagnosis.

The research team retrospectively investigated the ability of ChatGPT to generate accurate differential diagnoses based on the physician notes recorded at the initial emergency department presentation. The patient data was fed into two versions of ChatGPT: the free 3.5 version and the 4.0 subscriber version.

Clinicians correctly included the diagnosis in the top five differential diagnosis for 83% of cases. For ChatGPT v3.5 this was 77% and for v4.0 was 87%. Furthermore, the correct diagnosis was included within clinician‘s top five likely diagnoses in 87% of the cases, which compared favourably to the 97% for ChatGPT version 3.5 and 87% for version 4.0.

When laboratory results were included in the assessment, clinicians chose the correct leading diagnosis in 53% of the cases, which was of comparable accuracy to ChatGPT v3.5 at 60% and v4.0 at 53%.

Commenting on these diagnostic results, lead author, Dr Hidde ten Berg said: ‘We found that ChatGPT performed well in generating a list of likely diagnoses and suggesting the most likely option. We also found a lot of overlap with the doctors’ lists of likely diagnoses. Simply put, this indicates that ChatGPT was able suggest medical diagnoses much like a human doctor would.

‘For example, we included a case of a patient presenting with joint pain that was alleviated with painkillers, but redness, joint pain and swelling always recurred. In the previous days, the patient had a fever and sore throat. A few times there was a discolouration of the fingertips. Based on the physical exam and additional tests, the doctors thought the most likely diagnosis was probably rheumatic fever, but ChatGPT was correct with its most likely diagnosis of vasculitis.’

Professor Youri Yordanov from the St Antoine Hospital emergency department in Paris, France, who is chair of the EUSEM 2023 abstract committee but was not involved in the research, added: ‘We are a long way from using ChatGPT in the clinic, but it’s vital that we explore new technology and consider how it could be used to help doctors and their patients.

’People who need to go to the emergency department want to be seen as quickly as possible and to have their problem correctly diagnosed and treated. I look forward to more research in this area and hope that it might ultimately support the work of busy health professionals.

ChatGPT is an artificial intelligence system that is being increasingly explored in healthcare, although its value is currently varied. For example, it has shown some promise for relatively straightforward questions in cardiology but performed less well in more complex vignettes.

EUSEM: Diagnostic ability of ChatGPT comparable to emergency department clinicians

Related articles

Popular articles