A Systematic Analysis of Performance Evaluation Metrics in Machine Learning Models
by Kabiru Ibrahim Musa, Mahmud Ahmed Usman, Muhammad Tella
Published: February 6, 2026 • DOI: 10.51584/IJRIAS.2026.11010070
Abstract
Machine Learning (ML) has been a critical computational paradigm that has shaped contemporary applications in such domains as finance, healthcare, and cybersecurity, such that its performance evaluation cannot be less critical. However, its selection and interpretation of metrics has remained inconsistent, often leading to misleading conclusions. This study presents a systematic analysis of the most commonly used performance evaluation metrics in ML, integrating conceptual taxonomy, mathematical definitions, and empirical assessment under controlled perturbations. There are three dimensions to ML performance evaluation metrics categorization: robustness, discrimination, and calibration. Experiment conducted on classification and regression, and using synthetic datasets and benchmarks, evaluate threshold variation, class imbalance and label noise. Results obtained showed that no single metric captures model performance comprehensively and widely used metrics may yield conflicting or misleading assessments under certain conditions. Also, context-aware selection and multi-dimensional reporting were necessary for reliable evaluation. By empirically linking metric behaviour to data characteristics, this study provides guidance for context-aware metric selection and reporting that is not only standardized but also evidence-based.