Comparison of Similarity Distance-Based Metrics for HODA and BANGLA Dataset for Enhanced Precision

by Amirul Ramzani Radzid, Mgd Maaz Taha Yassin, Mohd Sanusi Azmi, Nur Atikah Arbain

Published: December 30, 2025 • DOI: 10.47772/IJRISS.2025.91200013

Abstract

A similar metric is often used as a tool to measure the degree of similarity between two objects or pieces of data. It is essential in many areas of study including data analysis, machine learning and image processing, which provides a way to compare and evaluate the similarity of different entities. These metrics can be categorized into distance-based and similarity-based approaches, each with their strengths and applications. Therefore, this study is to do a comparison of various distance metrics on image classification performance using HODA and Bangla handwritten digit datasets. A comprehensive evaluation is conducted on eight different distance measures, namely Euclidean, Manhattan, Chebyshev, Canberra, Cosine, Minkowski, Jaccard, and Sorenson, within the Mean Average Precision (MAP) metric framework to evaluate their effectiveness in the context of handwritten digit recognition. Experimental results show that Chebyshev distance produces the highest classification accuracy of 71.6% on the HODA dataset, while Euclidean distance achieves the best performance on the Bangla dataset with 70.7% accuracy. In addition to quantitative analysis, a user study involving a structured questionnaire was conducted to qualitatively verify the MAP-based evaluation methodology. Results from user evaluations further reinforce the empirical findings. Therefore, the study underlines the importance of choosing an appropriate distance metric that is adapted to the specific properties of the dataset, highlighting its role in improving the performance of pattern recognition systems in computer vision applications.