Modern artificial intelligence (AI) models outperform less experienced physicians when diagnosing skin lesions but remain less accurate than expert dermatologists when tested using real-world cases, according to a large comparative study.

AI systems have shown promising performance in skin cancer detection under controlled conditions, but their effectiveness in routine clinical practice remains uncertain.

Researchers therefore assessed how different AI models performed when faced with a broad range of common, rare and atypical skin lesions, in comparison to physicians with varying levels of dermatological experience.

Published in the journal JAMA Dermatology, the multi-institutional diagnostic study used the Test of Dermoscopy for International Validation platform, which contained 1,117 clinical cases representative of everyday dermatology practice. Each case included clinical and dermoscopic images, along with patient demographics, risk factors and lesion history.

The study compared three AI systems – a first-generation convolutional neural network (CNN) and two versions of the PanDerm foundation model – against 652 physicians who completed 1,092 diagnostic test iterations.

Dermatologist accuracy in skin cancer diagnosis

The primary outcome was multiclass accuracy across nine diagnostic categories. Secondary outcomes were sensitivity, specificity and balanced accuracy for distinguishing benign from malignant lesions.

Overall, physicians achieved a mean diagnostic accuracy of 65.9%, compared with 56.7% for the CNN, 72.2% for the PanDerm unimodal model and 66.3% for the PanDerm multimodal model.

Expert dermatologists achieved the highest overall accuracy and outperformed all AI systems on the primary endpoint.

Diagnostic performance improved with experience, ranging from 59.1% among those with less than one year of dermoscopy experience to 74.2% among those with more than 10 years of experience.

The unimodal foundation model outperformed physicians with fewer than three years of experience and achieved results comparable to those with three to 10 years of experience. However, the unimodal foundation model did not match the performance of the most experienced dermatologists.

Unexpectedly, the multimodal model, which incorporated clinical photographs and patient metadata, performed less well than the unimodal version. The authors suggested this may reflect difficulties integrating different data types or differences between training and testing datasets.

AI models supporting clinical expertise

When lesions were classified as benign or malignant, both foundation models achieved higher specificity and area under the receiver operating characteristic curve than physicians.

However, experienced physicians maintained the highest sensitivity, indicating they were better at avoiding missed malignancies. Physicians also outperformed AI models for several key malignancies, including melanoma and squamous cell carcinoma.

The authors suggested that foundation models represent a substantial advance over earlier CNN-based systems and may be particularly useful as decision-support tools for less-experienced clinicians. AI could therefore act as their second reader, an educational aid or a triage tool to help reduce diagnostic errors and support training, rather than replacing clinical expertise.

Several limitations of the work were acknowledged, including retrospective image collection, over-representation of diagnostically challenging cases, limited ethnic diversity within the dataset and the predominance of French physicians among participants.

The study also independently assessed AI and physicians and did not evaluate combined human–AI decision-making.

The authors concluded that expert dermatologists remain the reference standard for diagnosing skin cancer. Future research should focus on improving the generalisability of foundation models and on evaluating collaborative workflows that integrate AI support with clinical judgement, while maintaining investment in dermoscopy education and training.

Reference
Anriot J et al. Limits of artificial intelligence models for skin cancer diagnosis in realistic settings. JAMA Dermatol 2026; published online 3 June 2026: doi:10.1001/jamadermatol.2026.1492.