Breast cancer screening using artificial intelligence systems has been found, in the majority of cases, to be less accurate than a radiologist.
Globally, in 2020, there were an estimated 2.3 million women diagnosed with breast cancer leading to 685,000 deaths. Fortunately, improvements in survival over recent decades have been attributed to population-based breast cancer screening with mammography. In fact, a recent UK study suggested that screening reduces cancer mortality by 38% among women screened at least once.
The use of artificial intelligence (AI) systems for image recognition in breast cancer screening could lead to improvements in the detection of cases, either as a standalone system or as an aid to radiologists. Indeed, there is some evidence to support the value of AI with one retrospective analysis of an AI screening algorithm concluding that it showed better diagnostic performance than a radiologist. Nevertheless, in a 2019 review, it was concluded that while AI systems have good accuracy for breast cancer detection, methodological concerns and evidence gaps exist that limit translation into clinical breast cancer screening settings.
In light of these concerns, a team from the Division of Health Sciences, University of Warwick, UK, were commissioned by the UK National Screening Committee to undertake a systematic review to determine whether there was sufficient evidence to support the introduction of AI for mammographic image analysis in breast screening. They conducted literature searches up to May 2021 and included studies that reported the test accuracy of AI algorithms either alone or in combination with radiologists, to detect breast cancer in digital mammograms in screening practice or in test sets. The team included cancer confirmed by histological analysis of biopsy samples in cases where women were referred for further tests after screening as the reference standard or from symptomatic presentation during follow-up.
The review identified a total of 12 studies including 131,822 women undergoing breast cancer screening. In studies with a standalone AI system, the algorithm calculated a cancer risk score, categorising women at either high (recall) or low (no recall) risk. When used to assist the radiologist, the AI system simply provided a level of suspicion. In two large retrospective studies including 76,813 women, that compared the AI system with the clinical decisions of a radiologist, 96% of systems were less accurate than a single radiologist and all were less accurate than a double read.
Overall, the authors reported considerably heterogeneity in study methodology, some of which resulted in high concerns over the risk of bias and applicability. In their study, they commented that “evidence is insufficient on the accuracy or clinical effect of introducing AI to examine mammograms anywhere on the screening pathway”.
In the conclusion, the authors noted how AI systems for breast cancer screening are a long way from having the quality and quantity required for implementation into clinical practice.
Freeman K et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ 2021