A study shows that a machine-learning generated treatment plan for patients with prostate cancer, while accurate, was less likely to be used by physicians in practice.
Advancements in machine-learning (ML) algorithms in medicine have demonstrated that such systems can be as accurate as humans. However, few systems have been used in routine clinical practice and often ML systems tested in parallel with physicians and actions suggested by the system not acted upon in practice. To fully utilise ML systems in routine clinical care requires a shift from its current adjunctive support role, to being considered as the primary option. In trying to assess the real-world value of an ML algorithm, a team from the Princess Margaret Cancer Centre, Ontario, Canada, decided to explore the value of ML-generated curative-intent radiation therapy (RT) treatment planning for patients with prostate cancer. The team’s overall aim was to evaluate the integration of the ML system as a standard of care and undertook a two-stage study comprising an initial feasibility to clinical deployment. For the initial validation phase, the team included data from 50 patients to assess the ML performance retrospectively. The researchers delivered ML-generated RT plans and asked reviewers to assess these plans (in a blinded fashion) with the actual plans used for the patient. In the subsequent deployment phase, again with 50 patients, both physician generated and ML generated were prospectively compared, again with the treating physician blinded to the source of the plan.
The ML system proved to be much faster at generating plans than the equivalent human-driven process (median 47 vs 118 hours, p < 0.01). Overall, ML-generated plans were deemed to be clinically acceptable for treatment in 89% of cases across both the validation and deployment phase (92% duration the validation phase and 86% during the deployment phase). In only 10 cases, the ML-generated method was deemed not applicable because the plans required consultation with the treating physician, thus unblinding the review process. In addition, 72% of ML-generated RT plans were selected over human-generated RT plans in a head-to-head comparison. However, when compared to the simulation and the deployment phase, the proportion of ML-generated plans used by the treating physician actually reduced from 83% to 61% (p = 0.02).
The authors were unable to fully account for these differences and suggested that either retrospective or simulated studies cannot fully recapitulate the factors influencing clinical-decision-making when patient care is at stake and concluded that further prospective deployment studies are required to validate the impact of ML in real-world clinical settings to fully quantify the value of such methods.
McIntosh C et al. Clinical integration of machine learning for curative-intent radiation treatment of patients with prostate cancer. Nat Med 2021