Enhanced Retrieval-Augmented Generation Framework for Intelligent Multi-Document Question Answering

by Aviral Pandey, Dr. Lakshmi Dhevi B, Navya Kumar

Published: May 23, 2026 • DOI: 10.51244/IJRSI.2026.1305000033

Abstract

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by using external documents to support their answers. However, baseline RAG architectures are limited by single-modality retrieval, fixed-size chunking, and lack of hallucination monitoring. This paper introduces an advanced hybrid RAG framework for multi-document question answering, enhancing retrieval quality, contextual coherence, and response fidelity.The proposed system combines FAISS’s dense semantic retrieval with BAAI/bge-large-en-v1.5 embeddings and BM25Okapi’s sparse lexical retrieval. Reciprocal Rank Fusion (RRF) combines results from both modalities to improve recall without changing any parameters. A semantic chunking strategy is introduced to keep the meaning of documents. This strategy uses sentence-level embeddings and percentile-based breakpoint detection to adaptively split documents. A cross-encoder reranker (ms-marco-MiniLM-L-12-v2) is used to improve the relevance scoring of the retrieved candidates.To mitigate hallucination without additional computational overhead, a reference-free faithfulness score is calculated by comparing the cosine similarity of generated responses to retrieved context embeddings. A multiprovider LLM abstraction layer makes sure that different cloud models are all based on the same things. The system is evaluated using Recall@K, Mean Reciprocal Rank (MRR), Precision@K, faithfulness score, and end-to-end latency. This shows that it is better at retrieving information and generating grounded information than dense-only baselines.