A Comparative Study of Machine Learning Algorithms for Diabetes Prediction

by Bakwa Dungka Dirting, Dimka Betty, Dr. Godwin Thomas Ayenajeh, Madugu Jimme Mangai, Oguche David Enekai, Stephen Mallo JR

Published: February 6, 2026 • DOI: 10.51584/IJRIAS.2026.110100109

Abstract

This study evaluates the performance of six machine learning models—Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), Decision Tree (DT), Random Forest (RF), and Gradient Boosting Classifier (GBC)—on a binary classification task. Among these, Random Forest (RF) achieved the highest accuracy (78.57%) and ROC-AUC (0.83), indicating superior overall predictive capability, albeit with a lower recall (0.56), suggesting a trade-off in detecting positive cases. Gradient Boosting (GBC) and KNN demonstrated balanced performance, with competitive F1-scores (0.69 and 0.68, respectively) and robust recall (0.73 and 0.71), making them suitable for scenarios requiring a harmony between precision and sensitivity. The Decision Tree (DT) model exhibited the highest recall (0.75), excelling in identifying true positives but at the cost of lower precision (0.62). While most models (LR, KNN, SVC, RF, GBC) maintained strong ROC-AUC scores (>0.80), SVC had the lowest accuracy (73.38%) and F1-score (0.60). The results suggest that model selection should be guided by specific priorities: RF for optimal accuracy and AUC, GBC/KNN for balanced metrics, and DT for maximizing true positive detection. These findings highlight the importance of aligning model choice with application-specific requirements in classification tasks.