Real-Time Customer Behavior Analysis Using Big Data Analytics and Machine Learning

by Dr. Lakshmi K., Smit Patel

Published: May 19, 2026 • DOI: 10.51244/IJRSI.2026.1304000240

Abstract

This paper presents a real-time customer behavior analysis system using Big Data Analytics and machine learning techniques. The rapid growth of digital platforms, mobile applications, and transactional systems has led to the generation of large volumes of customer data, which traditional processing methods struggle to handle efficiently. The proposed framework integrates Apache Kafka for real-time data ingestion, Hadoop Distributed File System (HDFS) for scalable storage, and Apache Spark for high-speed data processing. Machine learning models including K-Means Clustering, Logistic Regression, and Association Rule Mining are applied for customer segmentation, purchase prediction, and product recommendation. The study addresses a key research gap by developing a unified, end-to-end pipeline that combines real-time processing with multiple analytical models and clearly defined evaluation metrics. Experimental results on a dataset of 500,000 customer records show that Logistic Regression achieves the highest accuracy of 90%, outperforming Decision Trees (88%) and K-Means (85%). The results demonstrate that the proposed approach enables improved customer targeting, enhanced retention strategies, and more effective data-driven decision-making.