A HYBRID CNN–TRANSFORMER FRAMEWORK FOR AUTOMATED SKIN CANCER DETECTION FROM DERMOSCOPIC IMAGES

Hamza A. Mashagba, Suhaila Abuowaida, Azlan B. Abd Aziz, Nawaf Alshdaifat, Mahmoud Baniata, Mardeni Bin Roslee, Mohamad Yusoff Alias, Azwan Mahmud

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Article

Data sources: ZENODO

A HYBRID CNN–TRANSFORMER FRAMEWORK FOR AUTOMATED SKIN CANCER DETECTION FROM DERMOSCOPIC IMAGES

descriptionPublicationkeyboard_double_arrow_right Article Under curationPublisher:Zenodo

Authors: Hamza A. Mashagba, Suhaila Abuowaida, Azlan B. Abd Aziz, Nawaf Alshdaifat, Mahmoud Baniata, Mardeni Bin Roslee, Mohamad Yusoff Alias, Azwan Mahmud;

doi: 10.5281/zenodo.19949266

A HYBRID CNN–TRANSFORMER FRAMEWORK FOR AUTOMATED SKIN CANCER DETECTION FROM DERMOSCOPIC IMAGES

- Summary

Abstract

The early detection of melanoma and other forms of skin cancer is currently one of the most difficult challenges facing clinicians in the field of dermatology. The difficulty lies in the subtle differences in appearance among benign and malignant lesions. In this research we introduce a new type of deep learning hybrid framework that utilizes both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to overcome the limitations inherent in single paradigm frameworks. Our framework utilizes a pre-trained version of EfficientNet-B4 to extract hierarchical local features from each image and a multi-layer Vision Transformer to capture long range spatial dependencies and global contextual information. To combine the two different types of complementary representation, our framework uses a sophisticated fusion methodology based on feature concatenation, multi-layer perceptron processing, and residual connections. The efficacy of our hybrid architecture was tested on the 33,126 dermoscopic images available on the ISIC 2020 dataset using a stratified 5-fold cross-validation testing approach. Our hybrid architecture achieved a superior diagnostic performance compared to the state-of-the-art previous model, which utilized a pre-trained EfficientNet-B4 + Attention. Specifically, our hybrid architecture achieved a 95.4% classification accuracy rate, a 90.7% sensitivity rate, a 95.1% specificity rate, and a .982 AUC-ROC value. The increases in both sensitivity and specificity rates represent clinically relevant improvements in both melanoma detection and false positive reductions. Therefore, our results demonstrate that combining CNN-based local texture analysis with transformer-based global semantic understanding creates a more accurate and robust computer aided diagnosis system, and offers significant opportunities to support clinicians in their decision-making processes as well as improve patient outcomes.

Found an issue? Give us feedback