Glaucoma Diagnosis Integrating Discriminative Optic Features with Clinical Domain Knowledge Using Deep Learning
by M. Sulthan Ibrahim, Z. Abdul Basith
Published: May 29, 2026 • DOI: 10.51244/IJRSI.2026.1305000084
Abstract
Glaucoma typically goes misdiagnosed until later stages, making it a leading cause of irreversible blindness globally, due to its asymptomatic onset. In order to overcome this clinical barrier, we suggest a unique deep learning framework that combines fundamental indicators from multimodal retinal imaging with important clinical risk issues in a way that allows for the early, precise, and comprehensible diagnosis of glaucoma. We developed a multimodal convolutional neural network guided by a clinician that simultaneously develops fundus images, clinical variables (intraocular pressure, age, family history, and ethnicity), and optical coherence tomography (OCT) scans (including B-scans and retinal nerve fiber layer thickness maps). The model employs a constrained attention-based fusion mechanism informed by European Glaucoma Society guidelines to arrange ophthalmologically relevant features.
The framework was estimated on a rigorously annotated pilot cohort of 10 South Indian patients (5 primary open-angle glaucoma cases, 5 healthy controls) from a tertiary eye care center in Chennai. Ground truth was established by consensus of two fellowship-trained glaucoma specialists using comprehensive clinical evaluation per Hodapp-Parrish-Anderson criteria and Humphrey visual ground testing. Performance was assessed via leave-two-out cross-validation with 95% confidence intervals estimated through 1,000 bootstrap iterations. Our model achieved 90% accuracy (95% CI: 78–97%), 100% sensitivity (95% CI: 92–100%), 80% specificity (95% CI: 64–92%), and an AUC-ROC of 0.95 (95% CI: 0.88–0.99)—outperforming unimodal baselines (fundus-only AUC = 0.88; OCT-only AUC = 0.90) and a late-fusion ensemble (AUC = 0.91). Ablation studies confirmed that integrating clinical metadata improved accuracy by 5 percentage points and reduced error rates by 50%. Grad-CAM visualizations demonstrated anatomically plausible attention patterns aligned with known glaucomatous damage zones (e.g., inferior/superior neuroretinal rim and RNFL thinning). This work presents three key innovations: (1) the first deep learning architecture for glaucoma that embeds clinician-specified constraints into the multimodal fusion process, ensuring diagnostic reasoning aligns with established ophthalmological principles; (2) a proof-of-concept showing that domain-informed merging of fundus, OCT, and clinical information produces performance that approaches inter-specialist agreement levels even with incredibly low data (n=10); and (3) an understandable, non-black-box design that directly connects model choices to pathophysiologically significant biomarkers, removing a significant obstacle to the therapeutic use of AI in ophthalmology.