Hybrid CNN–Vision Transformer Architecture forAccurate Liver Cancer Diagnosis from MedicalImaging
DOI:
https://doi.org/10.5281/zenodo.20047881Keywords:
Hybrid Deep Learning, CNN, Vision Transformer, Liver Cancer, Medical Imaging, Feature Fusion, Transfer LearningAbstract
Detecting liver cancer early remains challenging because medical images can vary widely between patients, and differences
in scan contrast are often subtle. This study describes a hybrid model that combines a CNN with a Vision Transformer, aiming to
capture both fine, local image details and broader contextual information. In this setup, the CNN is used to focus on nearby visual
signals such as edges and textures, while the transformer analyzes the full image to learn longer-range relationships between different
regions. The method is evaluated on public datasets, including LiTS and TCGA-LIHC, with consistent preprocessing applied across
all data. The reported accuracy is 94.8%, which is higher than the results from models using only a CNN or only a transformer.
These findings indicate that leveraging both local and global features may lead to better performance in liver cancer detection
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Satyendra Sharma, Pradeep Laxkar (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
