According to a meta-analysis, the fracture detection performance of artificial intelligence systems and clinicians are broadly equivalent
The fracture detection rates are comparable for artificial intelligence (AI) and clinicians according to the findings of a meta-analysis by researchers from the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Oxford, UK.
Fractures represent a common reason for admission to hospital around the world. However, research suggests that fortunately, fracture rates have stabilised. For example, one 2019 UK-based study observed that the risk of admission for a fracture between 2004 and 2014 was 47.8 per 10,000 population but that the rate of fracture admission remained stable. Unfortunately, however, fractures are not always detected on first presentation as witnessed by a two-year study in which 1% of all visits resulted in an error in fracture diagnosis and 3.1% of all fractures were not diagnosed at the initial visit. One solution to improve upon the diagnostic accuracy of fractures is the use of artificial intelligence systems and in particular, machine learning, which enables algorithms to learn from data. Related to machine learning is deep learning, which is a more sophisticated approach to machine learning that uses complex, multi-layered “deep neural networks. Deep learning systems hold great potential for the detection of fractures and in a 2020 review, the authors concluded that deep learning was reliable in fracture diagnosis and had a high diagnostic accuracy.
For the present meta-analysis, the Oxford team further assessed and compared the diagnostic performance of AI and clinicians on both radiographs and computed tomography (CT) images in fracture detection. The team searched for studies that developed and or validated a deep learning algorithm for fracture detection and assessed AI vs clinician performance during both internal and external validation. The team analysed receiver operating characteristic curves to determine both sensitivity and specificity.
Fracture detection rates of AI and clinicians
A total of 42 studies with a median number of 1169 participants were included, 37 of which included fractures detected on radiographs and 5 with CT. A total of 16 studies compared the performance of the AI against expert clinicians, 7 to experts and non-experts and one compared AI to non-experts.
When evaluating AI and clinician performance in studies of internal validation, the pooled sensitivity was 92% (95%CI 88 – 94%) for AI and 91% (95% CI 85 – 95%) for clinicians. The pooled specificity values were also broadly similar with a value of 91% of AI and 92% for clinicians.
For studies looking at external validation, the pooled sensitivity for AI was 91% (95% CI 84 – 95%) and 94% (95% CI 90 – 96%) for clinicians on matched sets. The specificity was slightly lower for AI compared to clinicians (91% vs 94%).
The authors concluded that AI and clinicians had comparable reported diagnostic performance in fracture detection and suggested that AI technology has promise as a diagnostic adjunct in future clinical practice.
Kuo RYL et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis Radiology 2022