How to Secure LLMs, RAG Pipelines, and Agents as One System

PUBLISHED:
February 5, 2026
|
BY:
Ganga Sumanth

Still pretending that AI security is not already behind your business? 

LLMs, RAG pipelines, and agents are no longer parked in innovation corners or demo environments. They sit inside customer workflows, internal decision systems, and data paths that touch regulated and sensitive information every day. Teams ship them fast because the business demands it, then protect them with controls designed for static apps and predictable inputs. That mismatch is how data leaks without alerts, how model behavior shifts without ownership, and how compliance gaps form long before anyone notices.

Nothing breaks loudly. No service goes down. Instead, external data sources influence outputs, retrieval layers expose information they should not, and agents act on assumptions no one reviewed end to end. Security teams look at the model, check a few boxes, and move on, while the real risk lives across the chain that feeds, grounds, and drives that model.

Table of Contents

  1. An AI system is a supply chain, and attackers go after the weak links
  2. LLMs, RAG pipelines, and agents break the assumptions your security program was built on
  3. Most security reviews miss AI supply chain risk because the process assumes stability
  4. Securing AI means securing the system

An AI system is a supply chain, and attackers go after the weak links

Talking about the model like the system is already causing security blind spots, because the model rarely acts alone in production. What runs in your environment is a layered supply chain where data, orchestration, retrieval, tools, and permissions all shape outcomes, often more than the model weights do.

Once you map it that way, the attack surface gets clearer and a lot less comfortable. Each layer introduces its own failure modes, and trust tends to bleed across layers because teams assume upstream controls handle it. That is how you end up with a model that looks secure in isolation while the overall system does things you would never approve of in a design review.

The AI supply chain components you are running

A useful mental model is to treat every dependency that can influence model output or trigger actions as part of the AI supply chain. In most production deployments, that supply chain includes:

Foundation models (external or internal)

  • Risk drivers
    • Model behavior can change across versions, model families, or vendor updates, which shifts safety and security characteristics without code changes in your stack.
    • Data exposure risk exists through prompts, tool outputs, and context windows, especially when sensitive content gets injected into the model input.
    • Guardrails placed only at the model boundary often miss risks introduced earlier in the chain (retrieval, orchestration, tool outputs).

Training and fine-tuning data

  • Risk drivers
    • Poisoned or low-integrity training data can bake in harmful behaviors, data memorization, or biased decision logic that shows up later as normal model output.
    • Data lineage breaks easily. Teams struggle to prove what went into training or fine-tuning, which becomes a governance and audit issue.
    • Leakage risk increases when fine-tuning data includes proprietary or regulated information that later surfaces through model responses or embeddings.

RAG data sources and vector stores

  • Risk drivers
    • RAG introduces a second input channel that bypasses many AppSec assumptions, because content enters the system as knowledge rather than user input.
    • Retrieval relevance becomes a security control. Weak ranking, weak filtering, or poor chunking can pull the wrong content into the model context.
    • Multi-tenant or poorly segmented vector stores create cross-tenant exposure risk, even when the model has no direct access to raw records.

Orchestration logic and prompts

  • Risk drivers
    • Prompt templates, system messages, and routing logic become executable policy. Small changes alter what the model is allowed to do, what it sees, and how it prioritizes instructions.
    • Any external content that reaches a prompt can change behavior, including documents, tickets, emails, web pages, and logs.
    • Guardrails that rely on string checks or pattern matching tend to degrade fast once prompts become dynamic and multi-step.

Tools, plugins, and external APIs

  • Risk drivers
    • Tool output becomes model input. Any tool that returns untrusted content can drive downstream decisions, including tools your team considers internal.
    • Plugins expand the trust boundary. A plugin that can read files, call internal services, or create tickets becomes a path to data access and operational impact.
    • API schemas and error messages can leak sensitive details. The model can amplify that leakage when it summarizes or reformats outputs.

Agents executing actions

  • Risk drivers
    • Agents turn model output into state changes. This shifts risk from bad answer to bad action.
    • Permission scope becomes the real control plane. Broad scopes create high-impact failure modes when an agent takes the wrong path.
    • Multi-step autonomy makes intent drift possible, because intermediate decisions compound and are rarely reviewed with the same rigor as code changes.

Why each layer creates independent risk

These layers do not fail in the same way, and they do not fail at the same time. Model security work (adversarial testing, jailbreak resistance, prompt hardening) covers one slice. Data integrity work (provenance, access control, segmentation, monitoring) covers another slice. Orchestration and agent work brings in identity, authorization, workflow safety, and transactional controls.

Problems show up when teams assume one layer’s controls magically apply to the next layer. That assumption is everywhere in AI deployments, because the system feels like one application even when it behaves like a pipeline.

Trust bleeds across the chain in a few common ways:

  • The model trusts retrieved content because the retrieval layer presented it as authoritative.
  • The orchestration layer trusts tool outputs because the tool sits behind your network boundary.
  • The agent trusts its own intermediate reasoning because nobody designed a hard stop or approval gate for high-risk actions.
  • Engineers trust the whole system because the model vendor says it is safe, even though the system’s unsafe behavior is coming from your data and your permissions.

Two failure patterns that prove this point

  1. A secure model producing unsafe outcomes because retrieval data is poisoned

Consider a support assistant grounded in internal policy docs and runbooks. The model has strong safety behavior in vendor testing, and prompt hardening looks solid. The system still produces unsafe recommendations after a policy document gets modified, or a new document gets ingested with malicious or misleading content.

This happens because RAG treats retrieved text as high-priority context. Poisoned retrieval data can:

  • Insert instructions that override policy constraints inside the prompt context.
  • Introduce false technical guidance that leads to insecure remediation steps.
  • Steer the model toward disclosing restricted information by shaping what it “believes” is allowed.

The model is doing exactly what it was designed to do, which is to follow the best available context. The failure lives in data integrity, ingestion controls, retrieval filtering, and provenance checks.

  1. A well-reviewed application exposing data because an agent is over-permissioned

Now take an internal engineering agent that can read documentation, query internal systems, and create or update tickets. The application code passes review. The model passes red teaming. The agent still leaks or misroutes sensitive data because it has broad access and weak boundaries.

Over-permissioned agents fail in predictable ways:

  • They fetch data that is technically accessible but inappropriate for the current user or context.
  • They attach sensitive logs or customer records to tickets or chat threads because the workflow allows it.
  • They call internal APIs in the wrong order, triggering unintended state changes that expose information downstream.

The root cause is usually simple and frustrating. The agent has privileges that were granted for convenience, then nobody went back to tighten scopes, add approval points, or enforce least privilege with real policy.

AI security does not hold together when you secure only one component. You need coverage across the chain because attackers target whichever layer has the weakest controls, the loosest permissions, or the least monitoring.

LLMs, RAG pipelines, and agents break the assumptions your security program was built on

Most AppSec and platform security programs are built on a set of assumptions that have held up for decades: inputs are bounded and can be validated, code paths are reviewable, and the system behaves predictably when it runs. LLM-based systems violate those assumptions in ways that are easy to underestimate during design reviews.

This is not okay. Traditional controls still matter, but they do not cover the failure modes that come from unbounded inputs, probabilistic outputs, retrieval-driven context, and autonomous tool use. 

Why LLMs break core AppSec assumptions

LLMs change what execution means in an application. Security programs are built to protect code paths, APIs, and data flows that engineers can enumerate, reason about, and constrain ahead of time. LLM-driven systems do not behave that way, because language, context, and inference become part of runtime behavior, not just inputs passing through it.

  • Inputs arrive from many channels at once, including user prompts, retrieved documents, tool outputs, system instructions, prior conversations, and data pulled dynamically from internal systems.
  • There is rarely a clean boundary between user-controlled input and system-controlled input, which causes trust decisions to blur inside the prompt context.
  • Harmful instructions do not need to look malicious, because a single sentence framed as guidance, documentation, or policy can override intent.
  • Input size, structure, and meaning are unpredictable, which breaks assumptions around schemas, validation rules, and edge-case handling.
  • Outputs are probabilistic and vary across runs, context order, temperature settings, and model versions, even when the prompt looks unchanged.
  • Testing loses its predictive value because you cannot define a stable set of expected outputs that hold under real-world variation.
  • Policy enforcement that relies on the model refusing unsafe requests degrades over time as prompts, workflows, and tools evolve.
  • Execution logic shifts into prompts and system messages, which bypass static analysis, code review rigor, and most CI/CD security controls.
  • Small prompt edits can materially change authorization behavior, decision thresholds, or tool usage without triggering security review.
  • Trust decisions become implicit, because the model chooses which parts of the context matter most, and that prioritization is opaque.
  • Logs show inputs and outputs but rarely explain why one instruction outweighed another, which complicates investigation and assurance.
  • Failures do not surface as crashes, exceptions, or access violations, so traditional detection and alerting never fire.
  • The system continues operating normally while producing unsafe, non-compliant, or policy-violating behavior that looks reasonable on the surface.

You cannot reason about LLM security the way you reason about application security. Behavior emerges from language, context, and probability, not from fixed logic. That mismatch is where most security programs quietly lose control.

Why agents multiply impact and reduce visibility

Agents take you from the model said something wrong to the system did something wrong, and that is where traditional security assumptions collapse the hardest. Once the model can execute actions through tools, the risk stops being limited to content and becomes operational.

Agents amplify risk in three predictable ways:

They act, not just respond

  • Tool calls create side effects: creating tickets, changing configurations, querying sensitive systems, sending messages, issuing refunds, modifying access, triggering deployments.
  • A bad decision can become a persistent change, which raises the cost of failure and shrinks the window to contain it.

Failures look like normal operations

  • An agent calling internal APIs or automation tools usually produces logs that look identical to legitimate use, especially when it runs under a service identity with broad permissions.
  • Abuse blends in because the sequence is plausible. It is just wrong in intent, and intent is rarely visible in standard telemetry.

Abuse scales faster than human oversight

  • Agents can loop, retry, and chain tool invocations at machine speed. A human reviewer cannot keep up once the system starts making rapid decisions across systems.
  • A compromised goal or instruction set can drive repeated actions, such as invoking the same tool in slightly different ways until it gets the outcome it wants, which resembles persistence and exploration, even though the logs look like automation doing automation.

A grounded example here is an agent repeatedly invoking tools under a compromised goal. A single injected instruction changes the objective from resolving the incident safely to collecting everything that might help, then the agent starts pulling logs, configs, customer records, and credentials from multiple systems because it has permission and because the workflow allowed it. Nothing about that looks like exploitation in the classic sense. It looks like a busy helper, and the damage shows up later when sensitive data lands in the wrong place.

Most security reviews miss AI supply chain risk because the process assumes stability

The review process assumes the system you approved stays the system you run. That assumption collapses the moment LLMs, retrieval, and agent workflows enter production, because the behavior keeps changing even when the code looks unchanged.

This is a process failure and not a tooling gap. Plenty of teams have scanners, policies, and checklists, and they still end up approving AI systems once, then losing visibility as those systems evolve week after week. By the time something goes wrong, the governance artifacts can describe what was intended, but they cannot explain what actually happened.

One-time reviews stop working once the system keeps moving

AI systems change on multiple planes at the same time, and each plane can shift risk without triggering the kind of review your program is built around.

  • Model behavior shifts across vendor updates, model swaps, configuration changes (temperature, system prompts, routing), and context handling changes.
  • Data changes continuously through ingestion jobs, new sources, updated documents, new embeddings, and shifting retrieval rankings that alter what the model sees at runtime.
  • Behavior evolves after deployment because prompts get tuned, tool chains expand, new agent skills get added, and teams optimize for product outcomes that unintentionally widen access and autonomy.
  • Attack surfaces expand quietly when integrations grow, because each new tool and API turns into another source of untrusted input and another path for action.

A one-time approval works for a static service with controlled inputs and deterministic logic. It does not hold for systems where output and action are shaped by changing data and changing orchestration.

What traditional reviews consistently miss in AI systems

Traditional design and threat model reviews focus on components that look like software artifacts: services, endpoints, dependencies, and infrastructure. AI introduces security-critical artifacts that rarely show up in the review packet, even though they drive real behavior.

  • Prompts and system instructions act like policy and control logic, and teams often treat them as product text rather than security artifacts that require change control, versioning, and review.
  • Embeddings and retrieval logic effectively decide what the model is allowed to know at runtime, yet they rarely get integrity checks, provenance requirements, or clear boundaries for what content can influence decisions.
  • Runtime decision paths are dynamic and context-dependent, which means review teams cannot reason about a single, stable execution flow the way they do with normal code.
  • Cross-system interactions become the real risk surface once the model can call tools, pull data from core services, and trigger actions across identity, ticketing, customer support, finance, engineering, and operations.
  • Authorization becomes harder to validate because permissions are exercised indirectly through agent tool calls, and the decision to call a tool can be driven by retrieved content rather than explicit user intent.

The uncomfortable part is that many of these risks live outside the code diffs that normally trigger review. Teams deploy small changes that are actually security changes, because they alter what the system can see, decide, or do.

Why the most common review outputs create a false sense of safety

A lot of AI security work looks solid on paper, then fails during real-world use because the artifacts are static while the system is dynamic.

  • Static threat models document what the system was supposed to be, and they age out the moment prompts, retrieval sources, and toolchains change.
  • Compliance-only assessments confirm that a process happened, but they rarely test behavioral safety under real inputs, real data, and real tool permissions.
  • Controls applied after design decisions are locked in tend to become wrappers, such as filters and guardrails bolted onto a system that already has broad retrieval access and broad agent permissions.
  • Model risk controls get treated as the whole program, while the retrieval layer, orchestration layer, and tool layer remain under-reviewed and under-monitored.

This is how teams end up with a system that passed review and still leaks data or takes unsafe actions. The review approved a snapshot. Production moved on.

A security review that assumes AI systems are static becomes obsolete the moment it is signed off. You need a review model that treats prompts, retrieval, embeddings, and tool permissions as first-class security artifacts, and you need ongoing visibility into how behavior changes as the system evolves.

Securing AI means securing the system

AI risk is systemic. It lives in the connections between models, data, retrieval layers, tools, and the decisions those pieces drive together. Asking whether the model is secure misses where real failures happen. The better question is where the AI system can be manipulated today without anyone noticing.

CISOs who stay ahead treat AI like critical infrastructure. They assume behavior will change after deployment, push for continuous and system-aware threat modeling, and enforce explicit trust boundaries and human control points where autonomy creates real risk. This does not slow adoption. It keeps security from turning into a post-incident explanation exercise.

Most teams are still building visibility into their AI supply chain, and that is the right place to start. we45 helps organizations map AI dependencies, identify where trust is assumed instead of enforced, and design security controls that hold up as systems evolve. This is how you keep AI from becoming the most expensive blind spot in the organization.

FAQ

What is the primary misconception that causes security blind spots in AI systems?

The biggest security blind spot comes from treating the model as the entire system. In reality, the production environment runs a layered AI supply chain where data, orchestration, retrieval, tools, and permissions all shape outcomes. The true risk lies across this entire chain, which feeds, grounds, and drives the model, often more than the model's weights do.

What are the key security risk drivers associated with Foundation Models?

Risk drivers include model behavior shifting across versions or vendor updates without code changes in your stack, data exposure risk via prompts and context windows when sensitive content is injected, and the failure of guardrails placed only at the model boundary to catch risks introduced earlier in the chain (e.g., retrieval).

How does Retrieval Augmented Generation (RAG) introduce new and unique security risks?

RAG introduces a second input channel that bypasses many traditional application security assumptions because content enters the system as knowledge rather than user input. Risks arise from retrieval relevance, where weak filtering or chunking can pull wrong content into the model context, and from multi-tenant vector stores that can create cross-tenant exposure risk.

What makes AI agents a higher operational risk compared to just Large Language Models (LLMs)?

Agents raise the stakes by turning a "bad answer" into a "bad action" because they execute state changes through tool calls (e.g., modifying access, sending messages, creating tickets). This amplifies risk, makes failures look like normal operations (blending abuse into legitimate logs), and allows abuse to scale faster than human oversight, making unintended intent drift possible.

Can a system with a secure, hardened model still produce unsafe or policy-violating outcomes?

Yes. A secure model can still produce unsafe recommendations if the retrieval data is poisoned or contains malicious/misleading content. Since RAG systems treat retrieved text as high-priority context, poisoned data can insert instructions that override policy constraints, leading the model to follow the best available context, even if it is unsafe.

Why are traditional AppSec and platform security programs insufficient for LLM-based systems?

Traditional security is built on assumptions that LLM systems violate: inputs are bounded, code paths are reviewable, and behavior is predictable. LLM systems have unbounded, multi-channel inputs, probabilistic outputs that vary across runs, and execution logic that shifts into prompts/system messages, bypassing standard static analysis and code review processes.

What is the core failure in security reviews that causes AI supply chain risk to be missed?

The core failure is a process assumption of stability. One-time reviews fail because AI systems are dynamic; their behavior continuously changes post-approval due to model updates, continuous data ingestion, prompt tuning, and expanding tool chains. Traditional reviews also miss security-critical artifacts like prompts, retrieval logic, and dynamic runtime decision paths.

What is the recommended strategic approach for securing the AI system supply chain?

The recommended approach is to treat AI as critical infrastructure, recognizing that AI risk is systemic and lives in the connections between all components (models, data, retrieval, tools). This involves assuming behavior will change after deployment, pushing for continuous, system-aware threat modeling, and enforcing explicit trust boundaries and human control points where autonomy creates real risk.

Ganga Sumanth

Ganga Sumanth is an Associate Security Engineer at we45. His natural curiosity finds him diving into various rabbit holes which he then turns into playgrounds and challenges at AppSecEngineer. A passionate speaker and a ready teacher, he takes to various platforms to speak about security vulnerabilities and hardening practices. As an active member of communities like Null and OWASP, he aspires to learn and grow in a giving environment. These days he can be found tinkering with the likes of Go and Rust and their applicability in cloud applications. When not researching the latest security exploits and patches, he's probably raving about some niche add-on to his ever-growing collection of hobbies: Long distance cycling, hobby electronics, gaming, badminton, football, high altitude trekking.
View all blogs
X