Indigenous People’s Language Identification Using Machine Learning for Linguistic Preservation

by Adrales, Lorelyn F, Garingo, Joshua Razzi B, Geraldez, Jan Anthony Q, Taboada, Vene Lucille T, Taladtad, Jelan Roy L.

Published: November 8, 2025 • DOI: 10.51584/IJRIAS.2025.1010000080

Abstract

Language is a fundamental aspect of human identity, deeply connected to geographical origins, cultural heritage, and social belonging. However, many indigenous languages across the world are gradually declining due to modernization, migration, and the growing influence of technology and global languages. The loss of these languages often leads to the disappearance of cultural values, oral traditions, and historical knowledge. This study explores the integration of machine learning techniques such as Long Short-Term Memory (LSTM), Yoon Kim’s Convolutional Neural Network model, and TextConvoNet in developing a mobile text-to-text identification and translation application for Blaan dialects spoken in General Santos City, Polomolok, and Sarangani. The goal of the application is to aid in the preservation and revitalization of the Blaan language while providing an accessible platform for both native speakers and learners to understand, translate, and communicate in their local dialects.
To evaluate the usability and effectiveness of the application, User Acceptance Testing (UAT) was conducted among selected users. Data were collected through structured interviews, document analysis, and standardized evaluation tools to ensure comprehensive assessment and validation. Experimental results showed that the TextConvoNet model achieved the highest accuracy rate of 74.00 percent, surpassing the performance of both LSTM and CNN-based models. This demonstrates the model’s efficiency in identifying and classifying Blaan dialects, highlighting its potential in the field of Natural Language Processing (NLP).
Future research should focus on expanding the dataset by collecting transcriptions from diverse age groups, locations, and communication contexts to improve model generalization and accuracy. Further refinement of the model’s architecture and parameter tuning is also recommended to enhance dialect classification and translation capabilities. Moreover, integrating speech-to-text and text-to-speech functionalities could facilitate real-time translation, pronunciation learning, and accessibility for non-literate speakers, ensuring the continued preservation and appreciation of indigenous languages.