Comparable diagnostic performance between artificial intelligence (AI) systems and clinicians has been demonstrated in a systematic review and meta-analysis of prospective melanoma studies, but broader validation is deemed necessary.

Dermoscopy remains the gold standard for melanoma diagnosis, while AI is increasingly explored as a decision-support tool. However, more evidence is needed to assess how its performance compares with that of dermatologists.

A recent analysis published in JAMA Dermatology addresses this gap by examining real-world clinical settings, where earlier retrospective studies may have overestimated AI performance.

The review evaluated 11 prospective studies conducted in clinical settings using dermoscopy, involving more than 2,500 patients and over 50 dermatologists. Eligible studies included adult patients with lesions suspicious for melanoma and required histopathology as the reference standard.

Nine studies reported performance metrics for dermatologists, five evaluated AI alone, and only one examined dermatologists supported by AI. Sample sizes and heterogeneity across study populations varied widely, with the number of malignant melanoma cases ranging from 26 to 653 and the number of non-malignant lesions ranging from 88 to 4,495.

Primary outcomes were diagnostic sensitivity, specificity, accuracy and balanced accuracy for melanoma detection. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) and QUADAS-Comparative (QUADAS-C) tools.

Comparable melanoma detection with gains from AI support

Pooled results showed that dermatologists achieved a sensitivity of 78.6% (95% CI 67.5–88.1%) and a specificity of 75.3% (95% CI 63.3–84.3%).

AI systems showed comparable performance, with a sensitivity of 80.9% (95% CI 63.6–94.5%) and a specificity of 75.6% (95% CI 64.5–85.6%).

Accuracy and balanced accuracy were also comparable between groups, with dermatologists achieving 75.3% and 77.4%, respectively, compared with 73.3% and 78.3% for AI.

Notably, the single study evaluating AI-assisted dermatologists found higher performance, with a sensitivity of 91.9% and a specificity of 83.7%, suggesting potential benefits of combining human and machine decision-making.

In direct head-to-head comparisons, AI consistently demonstrated higher specificity than dermatologists, while sensitivity was similar or slightly lower. This pattern may reflect dermatologists’ more cautious clinical decision-making, as they are more likely to recommend a biopsy in uncertain cases.

Implications for diagnosis in clinical practice

Despite encouraging findings, the authors highlighted several limitations. Most studies included showed a high risk of bias, particularly in patient selection, as nine of the 11 studies pre-selected lesions already suspected of melanoma, limiting generalisability to routine clinical practice where a broader range of lesions is encountered.

Reliance on binary classification (melanoma vs non-melanoma) may further oversimplify the diagnostic process and fail to capture the nuanced decision-making required in real-world settings, underscoring the need for more comprehensive evaluation of AI.

The findings nonetheless support the potential role of AI as a decision-support tool for melanoma detection, which could help to reduce unnecessary biopsies while maintaining diagnostic accuracy.

However, as clinical validation is still in its early stages, the authors cautioned that future research should focus on larger, multicentre prospective studies using unselected patient populations to better reflect real-world practice and confirm safety, reliability and clinical utility.

Reference
Laiouar-Pedari S et al. Prospective evidence on artificial intelligence-assisted melanoma diagnostics: a systematic review and meta-analysis. JAMA Dermatol 2026;March 25:e260217.