Zenthera: A High-Speed Antimicrobial Resistance Prediction Pipeline Using K-Mer Analysis and Tree-Based Ensembles

by Dr. Roshni Padate, Jahnavi Shah, Shreysh Nair, Tanish Ingole, Tanmay Mahajan

Published: May 30, 2026 • DOI: 10.51584/IJRIAS.2026.11050066

Abstract

Antimicrobial resistance (AMR) is a rapidly growing problem in modern medicine. When doctors don’t know exactly which bacteria is causing an infection, they often prescribe broad-spectrum antibiotics. This practice actually speeds up the evolution of drug-resistant pathogens. The standard way to figure out which drug works is Antibiotic Susceptibility Testing (AST). However, AST requires physically growing bacteria in a lab, which can take anywhere from 24 to 72 hours. In this paper, we introduce Zenthera, a computational biology pipeline designed to skip this culturing step entirely. We built a system that uses raw Whole Genome Sequencing (WGS) data to predict resistance against 14 different antibiotics in real-time. Instead of slow genetic alignment, our pipeline uses a k-mer (k=7) frequency approach combined with TF-IDF vectorization. We trained Random Forest and XGBoost models on a dataset of over 100,000 bacterial genomes, achieving an average accuracy of 92.4% and an F1-score of 0.91. Because we used GPU acceleration, our system can process a genome and provide a clinical prediction in less than a second. To make this actually usable for doctors, we deployed the models inside a full-stack web application. Zenthera shows that we can eliminate the waiting time of traditional lab tests without losing accuracy.