Mapping tumours with greater precision using SMMILe: expert commentary

SMMILe, a new artificial intelligence algorithm developed by Dr Zeyu Gao at the University of Cambridge, can analyse complex cancer images and map tumours with greater spatial accuracy, offering more reliable visual cues for pathologists and opening the door for clinicians to make more personalised treatment decisions.

Artificial intelligence (AI) tools have significant potential to support pathologists in analysing tissue samples from patients with suspected or confirmed cancer, generating ‘spatial maps’ that show where cancer cells are located and how they are distributed.

However, their development has typically relied on large numbers of high-quality, expertly annotated slides and a key limitation remains: while many systems can accurately predict a case-level diagnosis, their visual heatmaps are often insufficiently reliable for clinical interpretation.

AI tools may focus too narrowly on the most obvious regions, overlook subtle tumour areas, or be misled by confounding tissue patterns. This makes them less useful when precise spatial detailing is critical.

Accurate spatial mapping in cancer diagnosis

Reliable heatmaps are highly valuable in routine pathology because many diagnostic decisions depend not only on whether a tumour is present, but also on its location, extent and the distribution of histological patterns across the slide.

In everyday practice, pathologists often assess tumour margins, identify small infiltrative foci and interpret heterogeneous architecture rather than a single obvious hotspot.

When visual cues are consistent, reliable heatmaps can serve as a useful guide during review – for example, by highlighting subtle extension at the invasive front or low-volume tumour near a resection edge.

This is particularly relevant in grading systems such as the Gleason score in prostate cancer, where diagnosis depends on recognising and weighting different growth patterns across the tissue.

In tumours with mixed subtypes, improved spatial mapping could also support more objective estimation of the proportion of each component.

Effective integration of AI systems into clinical practice

Real-world pathology workflows vary at many levels, such as scanners, laboratories, staining conditions, tissue preparation pipelines, tumour types, and degrees of artefact or heterogeneity.

Robustness is what turns a promising algorithm into a potentially useful tool.

Our view is that generalisability across cancer types and visual variability is central to the translation of AI into practice, and a system that performs consistently across diverse scenarios is much more likely to fit into real-world workflows and prospective evaluation studies.

Effective implementation depends on integrating AI into routine workflows as a decision-support tool with clear human oversight and clear escalation pathways when the AI output and clinical impression do not align.

For this to happen, there needs to be local validation before deployment, including testing across the institution’s own scanner, staining practices and case mix.

Ongoing feedback between pathologists and developers is also needed, so that failure modes and any performance drift can be identified early and monitored, and outputs that are intuitive and transparent to enable users to understand both predictions and their spatial rationale.

Developing and validating SMMILe

Taking these factors into account, research by our team at the University of Cambridge, in conjunction with colleagues at Xi’an Jiaotong and Northwestern Polytechnical Universities in China, and the National University of Singapore, has led to the development of Superpatch-based Measurable Multiple Instance Learning (SMMILe).

SMMILe is designed to improve how digital pathology models analyse whole-slide images. Unlike many existing tools that primarily predict a case-level diagnosis, SMMILe also aims to quantify where diagnostically relevant tissue patterns are located across a slide, making its outputs more spatially reliable and clinically meaningful.

Our approach addresses limitations of current multiple-instance learning methods that rely on general slide-level labels rather than detailed annotations. While effective for classification, these models often produce maps that miss subtle tumour regions or overemphasise only the most obvious features.

SMMILe builds on multiple-instance learning by analysing individual image patches and grouping them into ‘superpatches’ to better capture spatial context. It incorporates design features such as instance dropout, delocalised sampling and a refinement network to improve detection of both prominent and low-discriminative tissue patterns, while still being trained using only slide-level labels.

SMMILe was tested on eight datasets comprising more than 3,000 whole-slide images covering breast, kidney, lung, ovarian, prostate and stomach cancers, and on multiple diagnostic tasks, including tumour detection, subtyping and grading.

Using fivefold cross-validation and comparing with nine leading methods, SMMILe consistently matched or sometimes exceeded state-of-the-art classification performance while delivering substantially improved spatial accuracy.

Potential clinical impact of SMMILe

From a clinical perspective, AI assistance is particularly valuable when the signal is real but weak – that is, not classic textbook fields but in borderline or low-discriminative areas that still carry diagnostic weight. These regions often require the most time and cognitive effort from the pathologist, and reliable AI support could improve both sensitivity and consistency in slide review.

SMMILe may help bring ambiguous regions to attention earlier and clarify their spatial extent, particularly in borderline zones such as small suspicious clusters, tumour cells at the invasive front, or transition areas between subtypes or grades, where diagnosis, staging and treatment planning can be sensitive to interpretation.

In clinical oncology, metastasis detection is also a key area where SMMILe support could be very valuable, as screening large slides for small foci is repetitive, time-consuming and prone to oversight due to visual fatigue.

Tumour grading and mixed-subtype assessment are also important, particularly where diagnosis depends on identifying and quantifying multiple architectural patterns. In these settings, SMMILe may help make heterogeneous morphology more measurable and reproducible.

Looking to the future

Histology is not only diagnostic but also spatially informative. The distribution of tissue phenotypes within a tumour can reflect underlying biology, including aggressiveness, microenvironmental interactions and potentially treatment sensitivity.

The ability to quantify spatial patterns with tools such as SMMILe may support more personalised care by helping to identify clinically relevant subregions, refine inclusion criteria, inform patient stratification in clinical trials, direct exploratory analyses linking morphology to response and guide downstream molecular sampling.

In conclusion, SMMILe addresses a key limitation of existing computational pathology methods by combining strong whole-slide classification performance with accurate spatial quantification of tissue patterns.

Using only slide-level labels, it consistently matches or exceeds state-of-the-art approaches and provides more reliable localisation of diagnostically relevant regions across diverse cancer types and tasks.

More broadly, in future, SMMILe may also facilitate downstream research applications, including biomarker discovery to improve cancer detection and aid the investigation of spatial relationships between tissue phenotypes and clinical outcomes.

Author

Dr Zeyu Gao PhD
Postdoctoral researcher, Department of Oncology and the Early Cancer Institute, University of Cambridge, UK