This website is intended for healthcare professionals only.

Hospital Healthcare Europe
Hospital Pharmacy Europe     Newsletter    Login            

ChatGPT outperformed trainee doctors’ respiratory assessments in new study

The rapidly developing area of technology and artificial intelligence (AI) within respiratory medicine and science was under the spotlight at this year’s European Respiratory Society (ERS) Congress, including the use of large language models (LLMs) such as ChatGPT to assess complex respiratory disease in children.

The LLM ChatGPT performed better than trainee doctors in assessing complex paediatric cases of respiratory disease, a study found, suggesting LLMs could be used to support patient triage.

UK researchers compared the performance of three LLMs (ChatGPT, Microsoft Bing and Google’s Bard) against early-career trainee doctors in providing responses to six paediatric respiratory clinical scenarios. Each scenario had obvious diagnosis and no published evidence, guidelines or expert consensus that pointed to a specific diagnosis or plan.

The 10 trainee doctors were given an hour with internet access, excluding access to LLMs, to solve each scenario with a 200- to 400-word answer.

Responses were randomised and scored by six experts overall and on specific criteria: correctness, comprehensiveness, utility, plausibility, coherence and humanness.

ChatGPT (median overall score 7) outperformed Bard (median 6), Bing (median 4) and the trainee doctors (median 4) in all domains.

Bard scored better than the trainee doctors in coherence, with Bing and the trainee doctors scoring similarly.

The six experts were able to identify Bing and Bard’s responses as non-human, but not ChatGPT’s responses.

Dr Manjith Narayanan, lead author and consultant in paediatric pulmonology at the Royal Hospital for Children and Young People, Edinburgh, and honorary senior clinical lecturer at the University of Edinburgh, UK, said they did not find any obvious instances of ‘hallucinations’ – the term for false information provided by LLMs – in the responses.

‘Even though… we did not see any instance of hallucination… we need to be aware of this possibility and build mitigations against this,’ he said.

The research team plan to test LLMs against more senior doctors and investigate newer and more advanced versions of the technology.

Commenting on the findings, Professor Hilary Pinnock, ERS Education Council chair and professor of primary care respiratory medicine at the University of Edinburgh, said the study pointed to a ‘brave new world of AI-supported care’.

She added: ‘As the researchers have demonstrated, AI holds out the promise of a new way of working, but we need extensive testing of clinical accuracy and safety, pragmatic assessment of organisational efficiency, and exploration of the societal implications before we embed this technology in routine care.’

ERS Congress co-chair Professor Judith Löffler-Ragg said the research presented at this year’s event under the theme of ‘Humans and machines: getting the balance right’ was pioneering and should guide future developments.

‘It is extremely important that we view developments in technology, and specifically AI, with an open mind but also a critical eye,’ she said.

‘Our vision is to advance personalised medicine through the responsible use of AI, continuously improving respiratory medicine.’

x