Document Type
Article
Keywords
Deep learning, Vision transformers, Chest X-ray (CXR), Convolutional neural network (CNN)
Abstract
Chest X-rays (CXRs) have been utilized as an important tool for diagnosing thoracic diseases because of their cost-effectiveness and accessibility. However, subtle signs of diseases, anatomical overlap and lack of expert radiologist in certain areas lead to difficulty in reading the CXRs. This indicates the needs of automated diagnostic systems to be accurate. Existing deep learning methods have shortcomings because they require pixel-level annotations to train models well. To address this problem, this study proposes Vision Transformers (ViT) with transfer learning for multiclass classification and weakly supervised localization of thoracic diseases in chest X-ray images. The technique uses patching, positional embedding, and transformer encoding all at once. Subsequently, a Multi-Layer Perceptron (MLP) head and score-weighted class activation mapping (Score-CAM) are implemented to make predictions process efficient. Thus, this technique makes diagnoses more accurate and makes it easier to find small lesions based on the experiment imposed to the ChestX-ray14 dataset. The results show that the accuracies are reliable: 0.83 for cardiomegaly, 0.85 for edema, 0.80 for consolidation, 0.87 for pleural effusion, 0.75 for atelectasis, and 0.89 for pneumonia. This study shows how Vision Transformers could help physicians find thoracic diseases earlier and make better decisions.
How to Cite This Article
Talab, Mohammed Ahmed; AL-Obaidi, Sumia Abdulhussien Razooqi; Hammadi, Othman I.; Awang, Suryanti; Nafis, Nur Syafiqah Mohd; and Kahtan, Hasan
(2026)
"Vision Transformer Model for Multiclass Classification and Localization of Thoracic Abnormalities in Chest X-rays,"
Mesopotamian Journal of Big Data: Vol. 6:
Iss.
1, Article 3.
Available at:
https://map.researchcommons.org/mjbd/vol6/iss1/3