A Novel Recurrent Convolutional Neural Network Framework for Continuous Sign Language Recognition Using Iterative Training and Multimodal Fusion

by Harish H, Kondragunta Rama Krishnaiah, P Vamsi Krishna

Published: June 3, 2026 • DOI: 10.51244/IJRSI.2026.1305000127

Abstract

In this article, we present a novel approach to continuous Sign Language (SL) recognition using a Recurrent Convolutional Neural Network (RCNN) with an iterative training process and multimodal fusion. Our primary goal is to accurately transcribe continuous SL video streams into ordered gloss sequences, overcoming the limitations of traditional methods that rely on frame-wise labeling and Hidden Markov Models (HMMs). To address the challenges posed by limited training data, we introduce an iterative optimization process that refines gestural alignments, ensuring improved model performance across training iterations. Additionally, we incorporate a multimodal fusion strategy that combines RGB frames and optical flow data to capture both appearance and motion cues, enhancing the spatiotemporal feature representation. The experimental results demonstrate that our approach outperforms existing SL recognition methods in terms of recognition accuracy and Word Error Rate (WER), showing significant potential for real-world applications such as real-time SL translation and human-computer interaction. Our system achieves robust performance even with unsegmented video streams, making it a promising solution for continuous SL recognition tasks.