This website is intended for healthcare professionals only.
Take a look at a selection of our recent media coverage:
6th April 2022
The fracture detection rates are comparable for artificial intelligence (AI) and clinicians according to the findings of a meta-analysis by researchers from the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Oxford, UK.
Fractures represent a common reason for admission to hospital around the world. However, research suggests that fortunately, fracture rates have stabilised. For example, one 2019 UK-based study observed that the risk of admission for a fracture between 2004 and 2014 was 47.8 per 10,000 population but that the rate of fracture admission remained stable. Unfortunately, however, fractures are not always detected on first presentation as witnessed by a two-year study in which 1% of all visits resulted in an error in fracture diagnosis and 3.1% of all fractures were not diagnosed at the initial visit. One solution to improve upon the diagnostic accuracy of fractures is the use of artificial intelligence systems and in particular, machine learning, which enables algorithms to learn from data. Related to machine learning is deep learning, which is a more sophisticated approach to machine learning that uses complex, multi-layered “deep neural networks. Deep learning systems hold great potential for the detection of fractures and in a 2020 review, the authors concluded that deep learning was reliable in fracture diagnosis and had a high diagnostic accuracy.
For the present meta-analysis, the Oxford team further assessed and compared the diagnostic performance of AI and clinicians on both radiographs and computed tomography (CT) images in fracture detection. The team searched for studies that developed and or validated a deep learning algorithm for fracture detection and assessed AI vs clinician performance during both internal and external validation. The team analysed receiver operating characteristic curves to determine both sensitivity and specificity.
Fracture detection rates of AI and clinicians
A total of 42 studies with a median number of 1169 participants were included, 37 of which included fractures detected on radiographs and 5 with CT. A total of 16 studies compared the performance of the AI against expert clinicians, 7 to experts and non-experts and one compared AI to non-experts.
When evaluating AI and clinician performance in studies of internal validation, the pooled sensitivity was 92% (95%CI 88 – 94%) for AI and 91% (95% CI 85 – 95%) for clinicians. The pooled specificity values were also broadly similar with a value of 91% of AI and 92% for clinicians.
For studies looking at external validation, the pooled sensitivity for AI was 91% (95% CI 84 – 95%) and 94% (95% CI 90 – 96%) for clinicians on matched sets. The specificity was slightly lower for AI compared to clinicians (91% vs 94%).
The authors concluded that AI and clinicians had comparable reported diagnostic performance in fracture detection and suggested that AI technology has promise as a diagnostic adjunct in future clinical practice.
Kuo RYL et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis Radiology 2022
13th December 2021
The use of convolutional neural networks (CNN) for diagnosing patients with an intracranial haemorrhage (ICH) appear to comparable to that of radiologists. This was the conclusion of a study by a team from the Faculty of Health and Medical Sciences, Copenhagen University, Denmark.
An ICH is usually caused by rupture of small penetrating arteries secondary to hypertensive changes or other vascular abnormalities and overall accounts for 10 – 20% of all strokes. However, this proportion varies across the world so that in Asian countries, an ICH is responsible for between 18 and 24% of strokes but only 8 – 15% in Westernised countries. An acute presentation of ICH can be difficult to distinguish from ischaemic stroke and non-contrast computerised tomography (CT) is the most rapid and readily available tool for the diagnosis of ICH.
As in many areas of medicine, artificial intelligence systems are becoming increasingly used and one such system is a Convolutional Neural Network (CNN), which represents a Deep Learning algorithm that is able to take an input image, assign importance to various aspects or objects within in the image and to differentiate one from the other. In fact, a 2019 systematic review of Deep Learning systems concluded that the ‘diagnostic performance of deep learning models to be equivalent to that of health-care professionals.’ Nevertheless, the authors added the caveat that ‘few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample.’
In the present study, the Danish team undertook a systematic review and meta-analysis to appraise the evidence of CNN in per-patient diagnosis of ICH. They performed a literature review and studies deemed suitable for inclusion were those where: patients had undergone non-contrast computed tomography of the cerebrum for the detection of an ICH; radiologists or a clinical report was used as the reference standard and finally where a CNN algorithm was deployed for the detection of ICH. For the purposes of their analysis, the minimum acceptable reference standard was defined as either manual, semi-automated or automated image labelling taken from radiology reports or electronic health records. For their analysis, the researchers calculated the pooled sensitivity, specificity and the receiver operating characteristics curves (SROC).
A total of six studies with 380,382 scans were included in the final analysis. When comparing the CNN performance to the reference standard, the pooled sensitivity was 96% (95% CI 93 – 97%), pooled specificity 97% (95% CI 90 – 99%) and an SROC of 98% (95% CI 97 – 99%). When combining both retrospective and external validation studies, for CNN, the performance was slightly worse with a pooled sensitivity of 95%, specificity 96% and pooled SROC 98%.
They concluded that CNN-algorithms accurately detect ICHs based on an analysis of both retrospective and external validation studies and that this approach seemed promising but highlighted the need for more studies using external validation test sets with uniform methods to define a more robust reference standard.
Jorgensen MD et al. Convolutional neural network performance compared to radiologists in detecting intracranial hemorrhage from brain computed tomography: A systematic review and meta-analysis. Eur J Radiol 2021
15th October 2021
The risk of breast cancer is increased among women with more dense breasts and the use of mammography can often miss cases in women with denser breasts. In a 2019 trial it was found that the use of supplementary breast MRI screening in women with dense breasts, lead to the diagnosis of significantly fewer interval cancers than mammography alone. However, screening programmes involve a huge number of women and many breast MRI scans of women with dense breasts show normal anatomical and physiological variation and therefore may not require radiological review.
A team from the Department of Radiology, University of Utrecht, therefore wondered if it was feasible to use an automated deep learning (DL) system for breast MRI screening to triage out normal scans without cancer to reduce the workload of radiologists. The team undertook a secondary analysis of data obtained from the prospective Dense Tissue and Early Breast Neoplasm Screening (DENSE) trial and the DL system was trained on left and right breasts separately and the results combined so that it was able to to differentiate between breasts with and without lesions. The performance of the DL system was assessed using the receiver operating characteristics (ROC) curves.
A total of 4581 breast MRI examinations of extremely dense breasts from 4581 women with a mean age of 54.3 years were included in the analysis. Of these 9162 breasts, 838 had at least one lesion, of which 77 were malignant. The area under the ROC curves in differentiating between a normal breast MRI and an examination with lesions was 0.83 (95% CI 0.80 – 0.85). The DL system considered that 90.7% (95% CI 86.7 – 94.7) of the MRI examinations with lesions were considered to be non-normal and would therefore be triaged to a radiologist review. In addition, the DL system dismissed 39.7% of the MRI examinations without lesions but did not miss any cases of malignant disease.
Commenting on their findings, the authors recognised a limitation in that their results were from the first round of the DENSE trial and that the number of lesions detected in subsequent screening rounds was smaller. Thus, they planned to further validate the performance of the model on data from subsequent rounds. The authors also suggested that future trials need to focus on demonstrating that the DL system is at least as effective as an expert radiologist at dismissing normal MRI examinations.
Verburg E et al. Deep Learning for Automated Triaging of 4581 Breast MRI Examinations from the DENSE Trial. Radiology 2021