This website is intended for healthcare professionals only.

Hospital Healthcare Europe
Hospital Pharmacy Europe     Newsletter       

Press Releases

Take a look at a selection of our recent media coverage:

ChatGPT outperformed trainee doctors’ respiratory assessments in new study

3rd October 2024

The rapidly developing area of technology and artificial intelligence (AI) within respiratory medicine and science was under the spotlight at this year’s European Respiratory Society (ERS) Congress, including the use of large language models (LLMs) such as ChatGPT to assess complex respiratory disease in children.

The LLM ChatGPT performed better than trainee doctors in assessing complex paediatric cases of respiratory disease, a study found, suggesting LLMs could be used to support patient triage.

UK researchers compared the performance of three LLMs (ChatGPT, Microsoft Bing and Google’s Bard) against early-career trainee doctors in providing responses to six paediatric respiratory clinical scenarios. Each scenario had obvious diagnosis and no published evidence, guidelines or expert consensus that pointed to a specific diagnosis or plan.

The 10 trainee doctors were given an hour with internet access, excluding access to LLMs, to solve each scenario with a 200- to 400-word answer.

Responses were randomised and scored by six experts overall and on specific criteria: correctness, comprehensiveness, utility, plausibility, coherence and humanness.

ChatGPT (median overall score 7) outperformed Bard (median 6), Bing (median 4) and the trainee doctors (median 4) in all domains.

Bard scored better than the trainee doctors in coherence, with Bing and the trainee doctors scoring similarly.

The six experts were able to identify Bing and Bard’s responses as non-human, but not ChatGPT’s responses.

Dr Manjith Narayanan, lead author and consultant in paediatric pulmonology at the Royal Hospital for Children and Young People, Edinburgh, and honorary senior clinical lecturer at the University of Edinburgh, UK, said they did not find any obvious instances of ‘hallucinations’ – the term for false information provided by LLMs – in the responses.

‘Even though… we did not see any instance of hallucination… we need to be aware of this possibility and build mitigations against this,’ he said.

The research team plan to test LLMs against more senior doctors and investigate newer and more advanced versions of the technology.

Commenting on the findings, Professor Hilary Pinnock, ERS Education Council chair and professor of primary care respiratory medicine at the University of Edinburgh, said the study pointed to a ‘brave new world of AI-supported care’.

She added: ‘As the researchers have demonstrated, AI holds out the promise of a new way of working, but we need extensive testing of clinical accuracy and safety, pragmatic assessment of organisational efficiency, and exploration of the societal implications before we embed this technology in routine care.’

ERS Congress co-chair Professor Judith Löffler-Ragg said the research presented at this year’s event under the theme of ‘Humans and machines: getting the balance right’ was pioneering and should guide future developments.

‘It is extremely important that we view developments in technology, and specifically AI, with an open mind but also a critical eye,’ she said.

‘Our vision is to advance personalised medicine through the responsible use of AI, continuously improving respiratory medicine.’

EUSEM: Diagnostic ability of ChatGPT comparable to emergency department clinicians

27th September 2023

The diagnostic ability of the artificial intelligence system ChatGPT is similar to that of emergency department clinicians when examining some complex diagnostic cases, according to the findings of a new study presented at the European Society of Emergency Medicine (EUSEM)‘s recent congress.

Simultaneously published in the Annals of Emergency Medicine, the study used data from 30 undiagnosed patients who were ultimately given a single proven diagnosis.

The research team retrospectively investigated the ability of ChatGPT to generate accurate differential diagnoses based on the physician notes recorded at the initial emergency department presentation. The patient data was fed into two versions of ChatGPT: the free 3.5 version and the 4.0 subscriber version.

Clinicians correctly included the diagnosis in the top five differential diagnosis for 83% of cases. For ChatGPT v3.5 this was 77% and for v4.0 was 87%. Furthermore, the correct diagnosis was included within clinician‘s top five likely diagnoses in 87% of the cases, which compared favourably to the 97% for ChatGPT version 3.5 and 87% for version 4.0.

When laboratory results were included in the assessment, clinicians chose the correct leading diagnosis in 53% of the cases, which was of comparable accuracy to ChatGPT v3.5 at 60% and v4.0 at 53%.

Commenting on these diagnostic results, lead author, Dr Hidde ten Berg said: ‘We found that ChatGPT performed well in generating a list of likely diagnoses and suggesting the most likely option. We also found a lot of overlap with the doctors’ lists of likely diagnoses. Simply put, this indicates that ChatGPT was able suggest medical diagnoses much like a human doctor would.

‘For example, we included a case of a patient presenting with joint pain that was alleviated with painkillers, but redness, joint pain and swelling always recurred. In the previous days, the patient had a fever and sore throat. A few times there was a discolouration of the fingertips. Based on the physical exam and additional tests, the doctors thought the most likely diagnosis was probably rheumatic fever, but ChatGPT was correct with its most likely diagnosis of vasculitis.’

Professor Youri Yordanov from the St Antoine Hospital emergency department in Paris, France, who is chair of the EUSEM 2023 abstract committee but was not involved in the research, added: ‘We are a long way from using ChatGPT in the clinic, but it’s vital that we explore new technology and consider how it could be used to help doctors and their patients.

’People who need to go to the emergency department want to be seen as quickly as possible and to have their problem correctly diagnosed and treated. I look forward to more research in this area and hope that it might ultimately support the work of busy health professionals.

ChatGPT is an artificial intelligence system that is being increasingly explored in healthcare, although its value is currently varied. For example, it has shown some promise for relatively straightforward questions in cardiology but performed less well in more complex vignettes.

ChatGPT shows promise but only for low-complex cardiology questions

19th April 2023

Use of chatGPT showed some promise for relatively straightforward questions in cardiology but performed less well in more complex vignettes

ChatGPT shows some promise as an AI-assisted decision-support tool, particularly for questions that are relatively straightforward. However, it performed less well when providing answers to more complicated case vignettes.

Chat Generative Pre-trained Transformer (ChatGPT) is an interactive AI model. The system follows instructions and provides a detailed response. Furthermore, the system has the potential to assist with medical education and even clinical decision-making. In the current study, researchers set out to assess ChatGPT’s performance at answering cardiovascular questions and in providing suggestions in case vignettes. For the questions, the reference standard was the medical expert who developed the questions. As for the 20 vignettes, the standard was the attending physician or consulted expert and the advice provided was checked with reference to clinical guidelines. The straightforward cardiovascular questions, related to several topics including acute coronary syndrome, atrial fibrillation and cardiovascular risk management. Vignettes involved symptoms that were potentially due to a cardiac problem (e.g., chest pain, dyspnoea) or required a diagnostic/treatment plan.

ChatGPT performance

Using 50 multiple choice cardiovascular questions, ChatGPT was correct in 74% (37/50) of cases. Scoring varied from 80% (for coronary artery disease) to 60% (cardiovascular risk management). For the vignettes, when seeking primary care advice, ChatGPT correctly answered questions in 90% of cases. When asked more complicated questions, the system was correct in only 50% of cases.

The authors felt that ChatGPT performed well with straightforward, low complexity questions. However, they felt more work was needed to fully evaluate the system’s potential.

Citation
Harskamp RE et al. Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2). MedRxiv 2023

x