Enhancing Resemblance Matching using Structural Awareness for Hierarchical LLM Caching

by Dr. Chaitanya Udatha, Krithi Chippada, Satvik Dabbara

Published: May 27, 2026 • DOI: 10.51584/IJRIAS.2026.11050049

Abstract

Large Language Models (LLMs) have become an integral part of our daily lives; they are used for tasks such as chatbots in customer services and require a lot of computing power. If the user base is large, generating different responses to similar queries results in slower performance and increased computational latency. Hence hierarchical caching systems like GPTCache and MinCache were introduced to reduce redundant inference using exact matching, resemblance matching and semantic matching of the prompts with stored queries to reuse LLM responses for similar queries. However, Unigram-based resemblance caching mechanisms are susceptible to adversarial lexical reordering leading to excessive false positive cache hits. The proposed research introduced structural-aware resemblance matching to improve the robustness of the system without violating the ideology of MinCache by using lightweight and fast similarity caching mechanisms. It has achieved 7.39x safer cache reuse compared to the standard 1-g Minhash while preserving 79.5% of resemblance layer throughput and maintained overall accuracy.