Vision Transformer Model for Multiclass Classification and Localization of Thoracic Abnormalities in Chest X-rays

Mohammed Ahmed Talab, Department of Medical Physics, College of Applied Sciences, Al-Fallujah University, Anbar, IraqFollow
Sumia Abdulhussien Razooqi AL-Obaidi, Ministry of Higher Education and Scientific Research, Baghdad, IraqFollow
Othman I. Hammadi, College of Education for Humanities, University of Anbar, Ramadi, IraqFollow
Suryanti Awang, Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang 26600, MalaysiaFollow
Nur Syafiqah Mohd Nafis, Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang 26600, MalaysiaFollow
Hasan Kahtan, Creative Computing Research Centre (CCRC), Cardiff School of Technologies, Cardiff Metropolitan University, Llandaff Campus, Western Ave, Cardiff CF5 2YB, United KingdomFollow

Corresponding Author

Suryanti Awang

Authors ORCID

Mohammed Ahmed Talab: https://orcid.org/0000-0002-7486-1669

Sumia Abdulhussien Razooqi AL-Obaidi: https://orcid.org/0000-0002-2863-6479

Othman I. Hammadi: https://orcid.org/0000-0002-3254-9490

Suryanti Awang: https://orcid.org/0000-0002-5468-1150

Nur Syafiqah Mohd Nafis: https://orcid.org/0000-0001-5833-9371

Hasan Kahtan: https://orcid.org/0000-0001-6521-7081

Document Type

Article

Keywords

Deep learning, Vision transformers, Chest X-ray (CXR), Convolutional neural network (CNN)

Abstract

Chest X-rays (CXRs) have been utilized as an important tool for diagnosing thoracic diseases because of their cost-effectiveness and accessibility. However, subtle signs of diseases, anatomical overlap and lack of expert radiologist in certain areas lead to difficulty in reading the CXRs. This indicates the needs of automated diagnostic systems to be accurate. Existing deep learning methods have shortcomings because they require pixel-level annotations to train models well. To address this problem, this study proposes Vision Transformers (ViT) with transfer learning for multiclass classification and weakly supervised localization of thoracic diseases in chest X-ray images. The technique uses patching, positional embedding, and transformer encoding all at once. Subsequently, a Multi-Layer Perceptron (MLP) head and score-weighted class activation mapping (Score-CAM) are implemented to make predictions process efficient. Thus, this technique makes diagnoses more accurate and makes it easier to find small lesions based on the experiment imposed to the ChestX-ray14 dataset. The results show that the accuracies are reliable: 0.83 for cardiomegaly, 0.85 for edema, 0.80 for consolidation, 0.87 for pleural effusion, 0.75 for atelectasis, and 0.89 for pneumonia. This study shows how Vision Transformers could help physicians find thoracic diseases earlier and make better decisions.

How to Cite This Article

Talab, Mohammed Ahmed; AL-Obaidi, Sumia Abdulhussien Razooqi; Hammadi, Othman I.; Awang, Suryanti; Nafis, Nur Syafiqah Mohd; and Kahtan, Hasan (2026) "Vision Transformer Model for Multiclass Classification and Localization of Thoracic Abnormalities in Chest X-rays," Mesopotamian Journal of Big Data: Vol. 6: Iss. 1, Article 3.
Available at: https://map.researchcommons.org/mjbd/vol6/iss1/3

Download

COinS