A Real-Time Multimodal Approach to Mental Health Monitoring and Analysis
by Mrs. T. Gayathri Devi
Published: June 17, 2026 • DOI: 10.51584/IJRIAS.2026.11060013
Abstract
The Multimodal Mental Health Analysis System collects inputs including PHQ-9questionnaire responses, written personal narratives, spoken video recordings, and optional location data to assess a user's mental wellness through a combined analysis pipeline. Text responses are processed using scoring methods and natural language processing techniques to identify patterns related to emotional distress, depressive symptoms, anxiety, burnout, and overall mental health indicators. Audio extracted from the spoken video response is analysed for transcript content, pitch, energy, and vocal variation to capture tone-related cues, while video frames are examined using computer vision methods to estimate facial emotion patterns and stress-related visual signals.
These different signals are integrated using a weighted scoring model to generate an overall wellness score, confidence level, risk summary, and clinical flags. When the assessment indicates elevated concern, the system provides personalized recommendations, crisis-support guidance where necessary, and access to nearby mental health resources discovered via Google Places API or OpenStreetMap. The system stores session data such as questionnaire scores, transcript summaries, audio-video analysis results, and generated reports. By combining rule-based PHQ-9 scoring, Hugging Face transformer emotion models, OpenAI Whisper speech recognition, Deep Face facial analysis, and optional Gemini LLM synthesis, the project offers a practical real-time mental health screening and support platform that helps users reflect on their condition and seek timely professional assistance.