XAI-Enabled Equivariant Vision Transformer for Pediatric Pneumonia Detection
by Dr. Thasni T, Hiba Tasnim
Published: April 17, 2026 • DOI: 10.51244/IJRSI.2026.1303000211
Abstract
Pneumonia remains the leading infectious cause of death in children under five, claiming over one million lives annually, with diagnostic delays in low-resource settings often exacerbated by rotated, flipped, or poorly aligned chest X-rays acquired from restless infants. Although convolutional neural networks (CNNs) and standard Vision Transformers (ViTs) have driven automated detection accuracies beyond 95% on benchmark datasets, their performance degrades significantly in real-world pediatric imaging due to limited geometric invariance and lack of clinical interpretability. This survey reviews advances reported recent years in deep learning approaches for pediatric pneumonia detection from chest X-rays, covering CNN-based hi- erarchies, global-context Vision Transformers, multimodal fusion with clinical markers, emerging equivariant transformer designs, and explainable AI techniques. Despite substantial progress in accuracy and sensitivity, persistent challenges include orienta- tion sensitivity, reliance on heavy data augmentation or non- imaging inputs, post-hoc interpretability, and limited standalone deployment in resource-constrained environments. The analy- sis highlights the need for geometrically robust, intrinsically interpretable, and clinically deployable models to bridge the gap between benchmark performance and reliable real-world pediatric screening.