AdverShield-LLM: Adversarial Robustness Certification for IoT-Integrated Retrieval-Augmented Generation via Randomized Smoothing
by Yasser Samir Hadi
Published: June 11, 2026 • DOI: 10.51244/IJRSI.2026.1305000230
Abstract
The emergence of the Internet of Things (IoT) ecosystems, Retrieval-Augmented Generation (RAG) systems have become commonplace and provide a means for embedding dynamically retrieved external knowledge in the response from a Large Language Model (LLM). While potentially helpful, IoT-enabled RAG pipelines present significant adversarial threats such as poisoning passages into the IoT knowledge base, altering dense retrieval embeddings, and conducting indirect prompt injection attacks via the inputs through the IoT sensors, all of which can impact the fidelity of generated responses and compromise the trustworthiness of the system. Current defenses are based mostly on heuristic filtering or empirical adversarial training, and are not known to be robustly certified or are fragile under adaptive adversaries.
In response to these challenges, this article introduces a new certified defense framework named AdverShield-LLM to combine the randomized smoothing technique with a multi-granular noise injection mechanism well-suited to the distributed and low latency requirements of RAG systems in the IoT domain. AdverShield-LLM consists of three synergistic modules: (i) Passage-Level Smoothed Aggregation (PLSA) module which certifies the robustness of RAG retrieval against bounded corpus poisoning under an isolate-then-smooth paradigm, (ii) Token-Adaptive Gaussian Defense (TAGD) layer that certifies LLM generation against indirect prompt injection by propagating l_2-norm perturbation bounds through the transformer attention stack, and (iii) IoT-Aware Certified Radius Scheduler (IACRS) that dynamically schedules noise budgets among constrained edge nodes while preserving the certified radius.
AdverShield-LLM is evaluated on three IoT security benchmarks—MS-RAG-IoT, NQ-Adversarial and IoTQA-Poison—with extensive experiments showing its certified accuracy is 81.4% under l_2 perturbation radius σ=0.50 compared to the strongest baseline RobustRAG which reported +9.3% accuracy, and reduced the attack success rate from 74.2% to 8.6% against PoisonedRAG. Moreover, AdverShield-LLM ensures the accuracy of clean answers within 2.1% of the undefended RAG accuracy, proving that certified robustness does not compromise the utility of RAGs in resource-limited IoT environments.