%20Vulnerabilities%20What%20Every%20Enterprise%20Should%20Know.png)
You already secured the model. Great. But the part no one’s talking about (the piece feeding it instructions, history, and memory) is wide open.
Most teams using GenAI are moving fast and bolting on Model Context Protocols (MCPs) without questioning what they were designed to do. These protocols weren’t built for secure execution, let alone for environments with sensitive data or regulated workloads. There’s no built-in access control, no isolation between context sources, and no real validation for what gets injected. So when something goes wrong, it’s not because the model was weak, but because the protocol gave attackers a front-row seat.
We’re seeing prompt injection, data leakage, model hijacking, and full-on RCE hit systems that passed every security review, because MCP wasn’t even on the checklist. And the worst part? This happens in dev, staging, and production, because context management isn’t treated as a security surface. But it absolutely is.
You’re not interacting with the model directly. Instead, you’re working through the protocol that feeds it, and that’s the layer attackers are exploiting.
Model Context Protocol (MCP) is the system that manages how prompts, tokens, memory, and instructions get delivered to the model during runtime. It acts as the glue between your LLM and the rest of your application stack. Whether you’re running a basic chat interface, a Retrieval-Augmented Generation (RAG) pipeline, or a multi-agent orchestration framework, the model can’t operate without a tightly managed stream of context. That stream is controlled by MCP.
Here’s what MCP typically handles under the hood:
In RAG systems, MCP is the layer that injects the retrieved documents into the prompt. In agent frameworks, it handles the step-by-step outputs that get routed back into planning and execution. In fine-tuned enterprise interfaces, it governs how user roles, data access levels, and workflows are dynamically translated into model interactions.
This protocol layer sits between the raw model and the application logic. It parses signals, shapes memory, and decides what the model pays attention to. And it rarely has any security controls in place.
Attackers are going straight for this layer, because once they get into the context stream, they can:
All of this happens outside the model. The base LLM might pass every safety test, and the app might have basic sanitization in place, but once you let external input flow into MCP-managed context, the threat surface shifts completely.
We’ve seen teams roll out RAG pipelines where document fetchers drop raw text directly into the prompt. We’ve seen agent loops reuse outputs from unauthenticated users across parallel sessions. We’ve seen devs wrap open-source orchestration layers around LLMs and assume that because the model is locked down, the system is safe.
It’s not.
MCP is the part making real-time decisions about what the model sees and how it behaves. And in most orgs, it’s treated like plumbing instead of a critical control surface that it is.
Securing this layer means applying the same rigor you’d use anywhere else in your stack:
You don’t need to guess where the risk is coming from. You can see it in every place context flows without boundaries or audit.
Your model gets compromised through the protocol feeding it instructions, memory, and user inputs in real time. And most security teams aren’t reviewing that layer at all.
These are the most critical MCP flaws we’re seeing in live deployments, across internal tools, customer-facing apps, and production RAG systems. They’re common, high-impact, and almost always missed in standard LLM security reviews.
Attackers don’t need to jailbreak the model when they can just poison the input stream. The risk starts when user inputs are appended directly to prompt templates without structured encoding, isolation, or boundaries. Once included, these inputs influence how the model interprets the entire context, especially in systems that reuse history, support multi-turn sessions, or preserve memory across user actions.
This gets worse in agent or orchestration frameworks where model outputs are looped back into the system as inputs for the next step. When previous context isn’t properly filtered, a crafted instruction can persist across multiple inference steps or alter downstream execution logic. Teams that treat prompts as strings instead of structured objects are especially vulnerable.
LLMs operate within strict context limits, and most implementations rely on sliding windows or summarization to manage token budgets. But systems often over-prioritize convenience by preserving full histories, cached memory, and function logs in a single context payload.
In practice, this results in sensitive or stale data leaking into prompts:
In multi-user or multi-tenant deployments, a lack of isolation between sessions increases the chance of accidental exposure. Without scoped memory boundaries, context can persist far longer than intended, across users, features, or environments.
Retrieval-Augmented Generation often pulls documents from vector stores, external APIs, or internal KBs. The issue is that these documents are rarely validated or sanitized before prompt assembly. It’s common to see RAG components:
This gives attackers the ability to inject arbitrary content into prompts simply by influencing what gets indexed or retrieved. And because retrieval is often detached from inference logs, those injections are hard to trace after the fact.
Most prompt assembly logic accepts inputs from multiple services: user interfaces, agents, tools, plugins, and workflow engines. In real deployments, there’s rarely any verification of who is allowed to inject context, or under what conditions.
We’ve seen systems where:
This creates a path for lateral movement, where one compromised input path allows broader influence over model behavior. It also makes it impossible to apply controls like role-based prompt construction or environment-based access restrictions.
When something breaks (or worse, when an attacker manipulates behavior), teams need to know exactly what was passed to the model at the time of inference. But most systems only log the top-level prompt or user input, ignoring the full assembly. That includes:
Without structured and versioned logging of these components, it becomes impossible to reproduce outputs, investigate incidents, or demonstrate compliance. This is a gap in visibility that attackers can exploit to move through the system undetected.
These vulnerabilities persist because the protocol layer looks like an implementation detail. But in a GenAI system, that layer is the actual runtime environment, and it’s exposed by design. When you let models act on real data, integrate with business logic, and drive workflows, MCP becomes a full-blown attack surface. You either review it like you would any other execution layer, or you leave it open to abuse.
When you leave the context layer unprotected, you're not just exposing the model, but also the systems, tools, and environments that trust that model to behave safely. And when that trust gets broken, the consequences hit availability, integrity, and confidentiality in ways that most security reviews aren’t even looking for.
The risk actually begins with how you manage what the model sees, remembers, and acts on.
In most LLM agent setups, like those built with LangChain or similar orchestration frameworks, the model doesn’t just generate text. It makes decisions. It picks tools, triggers actions, and moves workflows forward. Those steps rely on trusted context to operate correctly. When that context is compromised, so is every downstream decision.
Here’s how this plays out: an attacker injects prompts that alter the agent’s reasoning chain, redirecting the model to execute a tool it wasn’t supposed to touch, or sending manipulated parameters into an external system. If your tools are wired to deploy, update configurations, or trigger workflows automatically, those actions go through, because the context said they were valid. And because everything is piped through what looks like an expected model response, there’s no alert until something goes wrong.
MCP implementations that preserve conversation history or memory across turns often do so without tightly binding that context to a single user or session. That creates scenarios where sensitive data from one interaction shows up in another, especially in environments where users share infrastructure or where agents interact across multiple projects. Common causes include:
This creates exposure for customer data, internal configurations, and even security controls. And it happens quietly, because the model output still looks coherent, even when the context was corrupted.
It’s increasingly common for GenAI systems to interface with internal tools, from CLI commands to API calls to code deployment systems. And many of those pipelines are designed to interpret model outputs as instructions. When you skip input validation or fail to constrain tool invocation, you open up execution risk. We’ve seen models:
An attacker who gains influence over context (even briefly) can use that position to generate payloads that execute downstream, especially in orchestrated agent setups where each output feeds the next step without enforced guardrails.
Picture a dev team that wires up a GenAI assistant to help manage CI workflows. The assistant reviews pull requests, manages build configurations, and recommends deployment actions. It uses context memory to track past builds, flags, and approvals, and it interfaces directly with the pipeline controller through authenticated APIs.
Now introduce a prompt crafted by a malicious contributor. The assistant parses the PR comment, pulls it into its reasoning loop, and due to insufficient filtering, updates its internal decision logic. The next time it runs, it pushes that PR forward, skips tests, and modifies deployment variables, all because the model was tricked into treating malicious context as a valid command path.
At this point, you’re not dealing with a failed code review, but with a live system that was reprogrammed through prompt manipulation, and because there’s no inference trace tied to the actual commands issued, you can’t prove where the compromise happened.
This is what happens when MCP vulnerabilities are ignored. The system continues to function, but the control plane underneath is no longer secure. Once attackers influence context, they gain access to decisions, data, and tools, often without triggering a single traditional security control. That’s the exposure. That’s why this isn’t just about LLM safety, but securing your real infrastructure.
A full audit is not always needed to spot the weak point in your GenAI deployment. Just look closely at how context is assembled, how tools are invoked, and how memory is scoped. These red flags show where protocol-layer exposure turns into real operational risk.
LangChain, Semantic Kernel, and LlamaIndex make it easy to compose tools, memory, and prompts. But ease of use often means insecure defaults. In real deployments, we’ve seen:
These defaults aren’t built to withstand adversarial input. Without review and customization, they turn the framework into a wide-open surface for context injection, leakage, or command chaining.
This is the most common and most damaging mistake. Raw input from UI forms, chat interfaces, or API requests gets passed directly into the prompt buffer. When that happens:
Filtering is all about enforcing structure, like escaping user-controlled text, validating schema compliance, and separating untrusted data from model-level directives.
If your logs don’t capture everything the model saw (not just the user input), you’re missing critical audit data. That includes:
Without this, your incident response is broken. You can’t investigate behavior, reproduce outputs, or verify which part of the context triggered a fault or leak.
This happens more often than teams realize. Developers embed internal tokens, API keys, or auth headers into system prompts or tool logic that gets included in the prompt window. We’ve seen:
Once those secrets are in the prompt, they can leak through model outputs, logs, or prompt injection. And since context is dynamic, you may not even know when they were exposed.
In many agent-based pipelines, the model decides which tool to call, and how. Without guardrails, this gives attackers control over execution. Red flags include:
When context is treated as a shared buffer, any component can insert instructions, override priorities, or trigger behavior, including ones the model was never meant to access.
Too many teams version prompts in random YAML files or string templates. There’s no review, no change tracking, and no test coverage. This leads to:
Prompts define behavior. That makes them code, and they need to be treated with the same level of rigor as any other logic in your stack.
Persistent memory can introduce serious cross-session and cross-user risks when it isn’t scoped. Issues include:
Once something lands in memory, especially something malicious or sensitive, it’s difficult to detect and harder to clean up. Context doesn’t forget unless you force it to.
Each of these red flags points to an architectural decision that either wasn’t reviewed or wasn’t threat modeled. These aren’t niche risks or theoretical edge cases. They show up in live deployments, and they create paths for context manipulation, data exposure, and system compromise. You need to start asking who controls context, how it’s assembled, and where the boundaries are supposed to be. If those answers aren’t clear, then you’re already exposed.
What you need is control. Most MCP risks come from how context is assembled, stored, and executed. And you can fix a lot of that with targeted changes. These steps that we will talk about are something that you can apply right now to reduce exposure across staging and production.
Before anything touches the model, it needs to be sanitized and structured. That applies to user messages, retrieved documents, tool outputs, and memory inserts.
What to implement:
All context builders should enforce this as part of pipeline logic.
Rather than relying on a single max-token limit, define granular budgets across all context contributors.
What to implement:
This ensures that the model always receives a predictable and controlled context window, even under attack conditions.
Every model call should produce a reproducible snapshot of the exact context that was passed to the LLM.
What to implement:
This gives you forensic visibility and operational clarity, and avoids surprises when things go sideways.
System prompts define how the model behaves. They must be immutable at runtime unless explicitly updated through a secure pipeline.
What to implement:
This prevents attackers (or internal devs) from modifying critical behavior paths without detection.
Most teams only validate user input. You need to validate the entire context chain and the model output before any action is taken.
What to implement:
This is where you stop inference from turning into compromise.
You don’t need to build this all in one sprint, but each of these steps closes a real attack path. MCP is your application’s control plane. And the longer it stays unreviewed, the more exposed your systems become. Secure the context. Secure the behavior. Secure the outcomes.
It’s not enough to ask whether a platform uses GenAI securely. You need to know how it manages prompt context, memory, and tool execution, because that’s where the control plane lives. Vendors and open-source tools that rely on MCPs should be ready to answer detailed questions. If they can’t, that’s your signal the stack wasn’t built with security in mind.
These are the questions you should be asking, and the features they should be able to prove.
Don't settle for general claims like we sanitize inputs or we use standard LLM guardrails. Push into the protocol layer and make them show their work.
Start with:
These are baselines for any platform that claims it handles sensitive GenAI workflows.
Any system that constructs or manages prompts on your behalf should be held to the same standard you’d apply to code execution or API integration. These are the features that matter most:
The platform must log the complete prompt window passed to the model, including system prompts, user input, tool output, memory, and retrieved content. That log must be versioned, immutable, and tied to a traceable inference ID.
Inputs from external sources (users, tools, documents) must be validated against a schema. Output that feeds into execution paths must be parsed and inspected before triggering tools or updates.
Every component that contributes to prompt assembly, such as user input, memory, tool output, should be scoped to a role or permission boundary. There should be no anonymous or global context inserts.
You need visibility into which prompt template or routing logic was used at every step. This includes template version IDs, changes made, and who approved the update.
The platform should allow you to enforce memory boundaries, set TTLs, and isolate memory between users, sessions, and environments.
There should be safeguards that detect when a context element tries to override or impersonate a system-level instruction. The system must isolate user input from trusted logic and flag deviations before inference.
These are the capabilities that separate secure platforms from those still in early prototyping mode. If a vendor can’t show you how context is stored, filtered, and traced, they’re not ready to run GenAI in environments that handle sensitive data, execute actions, or operate at scale.
Context is infrastructure. Start vetting it like you would any privileged system, and don’t move forward until the protocol layer is just as hardened as the model itself.
Security leaders tend to focus on model safety, but that's not where most real-world compromises begin. The underlying protocol stack, such as how context is handled, how tools are called, how memory persists, is often treated like glue code when it should be treated like the control layer it is.
That’s the change that needs to happen. MCP is part of your infrastructure. The sooner your security reviews treat it that way, the less cleanup you'll be doing later.
What’s coming next is more interconnectivity. Agent stacks are becoming more autonomous. Toolchains are getting more powerful. And decisions that used to be supervised will soon be entirely model-driven. That makes securing the protocol path the only sustainable way to scale GenAI without inviting compromise.
Don’t wait until these systems are embedded into CI/CD, customer workflows, or production deployments. By then, the blast radius is already too big.
we45’s AI Security Services give your team the expertise and testing depth to evaluate how your GenAI systems handle context. From prompt injection exposure to agent-based attack paths, our model context protocol assessments help you secure the logic that actually runs your AI workflows. Learn more here: we45 Model Context Protocol Security Assessment.
Model Context Protocol (MCP) is the system that manages how prompts, tokens, memory, and instructions are delivered to a Large Language Model (LLM) during runtime. It acts as the "glue" between the LLM and the rest of the application stack, controlling the stream of context that the model sees, remembers, and acts on.
MCP is a critical attack surface because it rarely has built-in security controls like access control or isolation. Attackers can exploit this layer to inject hostile prompts, manipulate memory, leak sensitive data, and exploit dynamic tool execution, even if the base LLM itself is secure.
The most critical vulnerabilities include: Context injection through unsanitized input: Raw user inputs are appended directly to prompt templates without structured encoding or isolation, allowing attackers to poison the input stream. Memory leakage: Persistent context buffers or shared memory between users/sessions can cause sensitive or stale data to leak into new prompts. Untrusted retrieval in RAG pipelines: Documents pulled from external or user-submitted sources are often inserted into the prompt without validation or sanitization, allowing arbitrary content injection. Missing authentication and scoping on context contributors: A lack of verification on who is allowed to inject context (e.g., backend services, plugins) creates paths for lateral movement. No forensic visibility: Systems fail to log the full assembled prompt (including system instructions, memory, tool outputs), making incident investigation impossible.
In LLM agent setups, a compromised context can hijack the agent’s reasoning chain. This redirects the model to execute tools it shouldn't touch or sends manipulated parameters into external systems. If these tools control deployment, configuration, or workflows, unauthorized actions like modifying deployment variables or triggering malicious code can occur. This is a path to a system compromise like Remote Code Execution (RCE).
Common red flags include: Using orchestration frameworks (like LangChain, Semantic Kernel) with insecure default security behavior still active. Injecting raw user input into prompts without structured filtering or escaping. No logging of the full inference context (system prompts, retrieved documents, memory, tool outputs). Embedding secrets or static credentials directly into system prompts or tool logic. Lack of access controls on tool execution or the prompt assembly process. Treating prompt logic as static configuration instead of version-controlled, executable code. Using persistent memory without expiration (TTL) or isolation between users and sessions.
You can reduce exposure with targeted changes focused on control: Validate and scope all prompt inputs: Enforce schemas, escape formatting tokens in user input, and attach origin metadata before assembly. Set strict token budgets: Define granular token limits for each context source (user input, RAG, memory) and enforce trimming/rejection policies at runtime. Log and version the full context: Capture a reproducible, auditable snapshot of the entire assembled prompt (all components) at every inference call. Sign and lock system prompts: Use cryptographic signatures and version IDs to ensure system prompts are immutable at runtime unless securely updated. Deploy enforcement controls: Implement Pre-inference guards to reject malicious prompts and Post-inference guards to validate model outputs before they trigger tools or downstream actions.
Do not accept general claims. Demand proof of features and ask specific questions about the protocol layer: How is prompt history stored and isolated across sessions and users? How can I view and audit every change made to the context (including memory and tool output)? What protections prevent tool misuse or execution abuse from inside a prompt? Do context assembly and prompt templates support cryptographic signing and version control? How do you enforce memory boundaries, TTLs, and isolation between environments?