
Still pretending that AI security is not already behind your business?
LLMs, RAG pipelines, and agents are no longer parked in innovation corners or demo environments. They sit inside customer workflows, internal decision systems, and data paths that touch regulated and sensitive information every day. Teams ship them fast because the business demands it, then protect them with controls designed for static apps and predictable inputs. That mismatch is how data leaks without alerts, how model behavior shifts without ownership, and how compliance gaps form long before anyone notices.
Nothing breaks loudly. No service goes down. Instead, external data sources influence outputs, retrieval layers expose information they should not, and agents act on assumptions no one reviewed end to end. Security teams look at the model, check a few boxes, and move on, while the real risk lives across the chain that feeds, grounds, and drives that model.
Talking about the model like the system is already causing security blind spots, because the model rarely acts alone in production. What runs in your environment is a layered supply chain where data, orchestration, retrieval, tools, and permissions all shape outcomes, often more than the model weights do.
Once you map it that way, the attack surface gets clearer and a lot less comfortable. Each layer introduces its own failure modes, and trust tends to bleed across layers because teams assume upstream controls handle it. That is how you end up with a model that looks secure in isolation while the overall system does things you would never approve of in a design review.
A useful mental model is to treat every dependency that can influence model output or trigger actions as part of the AI supply chain. In most production deployments, that supply chain includes:
These layers do not fail in the same way, and they do not fail at the same time. Model security work (adversarial testing, jailbreak resistance, prompt hardening) covers one slice. Data integrity work (provenance, access control, segmentation, monitoring) covers another slice. Orchestration and agent work brings in identity, authorization, workflow safety, and transactional controls.
Problems show up when teams assume one layer’s controls magically apply to the next layer. That assumption is everywhere in AI deployments, because the system feels like one application even when it behaves like a pipeline.
Trust bleeds across the chain in a few common ways:
Consider a support assistant grounded in internal policy docs and runbooks. The model has strong safety behavior in vendor testing, and prompt hardening looks solid. The system still produces unsafe recommendations after a policy document gets modified, or a new document gets ingested with malicious or misleading content.
This happens because RAG treats retrieved text as high-priority context. Poisoned retrieval data can:
The model is doing exactly what it was designed to do, which is to follow the best available context. The failure lives in data integrity, ingestion controls, retrieval filtering, and provenance checks.
Now take an internal engineering agent that can read documentation, query internal systems, and create or update tickets. The application code passes review. The model passes red teaming. The agent still leaks or misroutes sensitive data because it has broad access and weak boundaries.
Over-permissioned agents fail in predictable ways:
The root cause is usually simple and frustrating. The agent has privileges that were granted for convenience, then nobody went back to tighten scopes, add approval points, or enforce least privilege with real policy.
AI security does not hold together when you secure only one component. You need coverage across the chain because attackers target whichever layer has the weakest controls, the loosest permissions, or the least monitoring.
Most AppSec and platform security programs are built on a set of assumptions that have held up for decades: inputs are bounded and can be validated, code paths are reviewable, and the system behaves predictably when it runs. LLM-based systems violate those assumptions in ways that are easy to underestimate during design reviews.
This is not okay. Traditional controls still matter, but they do not cover the failure modes that come from unbounded inputs, probabilistic outputs, retrieval-driven context, and autonomous tool use.
LLMs change what execution means in an application. Security programs are built to protect code paths, APIs, and data flows that engineers can enumerate, reason about, and constrain ahead of time. LLM-driven systems do not behave that way, because language, context, and inference become part of runtime behavior, not just inputs passing through it.
You cannot reason about LLM security the way you reason about application security. Behavior emerges from language, context, and probability, not from fixed logic. That mismatch is where most security programs quietly lose control.
Agents take you from the model said something wrong to the system did something wrong, and that is where traditional security assumptions collapse the hardest. Once the model can execute actions through tools, the risk stops being limited to content and becomes operational.
Agents amplify risk in three predictable ways:
A grounded example here is an agent repeatedly invoking tools under a compromised goal. A single injected instruction changes the objective from resolving the incident safely to collecting everything that might help, then the agent starts pulling logs, configs, customer records, and credentials from multiple systems because it has permission and because the workflow allowed it. Nothing about that looks like exploitation in the classic sense. It looks like a busy helper, and the damage shows up later when sensitive data lands in the wrong place.
The review process assumes the system you approved stays the system you run. That assumption collapses the moment LLMs, retrieval, and agent workflows enter production, because the behavior keeps changing even when the code looks unchanged.
This is a process failure and not a tooling gap. Plenty of teams have scanners, policies, and checklists, and they still end up approving AI systems once, then losing visibility as those systems evolve week after week. By the time something goes wrong, the governance artifacts can describe what was intended, but they cannot explain what actually happened.
AI systems change on multiple planes at the same time, and each plane can shift risk without triggering the kind of review your program is built around.
A one-time approval works for a static service with controlled inputs and deterministic logic. It does not hold for systems where output and action are shaped by changing data and changing orchestration.
Traditional design and threat model reviews focus on components that look like software artifacts: services, endpoints, dependencies, and infrastructure. AI introduces security-critical artifacts that rarely show up in the review packet, even though they drive real behavior.
The uncomfortable part is that many of these risks live outside the code diffs that normally trigger review. Teams deploy small changes that are actually security changes, because they alter what the system can see, decide, or do.
A lot of AI security work looks solid on paper, then fails during real-world use because the artifacts are static while the system is dynamic.
This is how teams end up with a system that passed review and still leaks data or takes unsafe actions. The review approved a snapshot. Production moved on.
A security review that assumes AI systems are static becomes obsolete the moment it is signed off. You need a review model that treats prompts, retrieval, embeddings, and tool permissions as first-class security artifacts, and you need ongoing visibility into how behavior changes as the system evolves.
AI risk is systemic. It lives in the connections between models, data, retrieval layers, tools, and the decisions those pieces drive together. Asking whether the model is secure misses where real failures happen. The better question is where the AI system can be manipulated today without anyone noticing.
CISOs who stay ahead treat AI like critical infrastructure. They assume behavior will change after deployment, push for continuous and system-aware threat modeling, and enforce explicit trust boundaries and human control points where autonomy creates real risk. This does not slow adoption. It keeps security from turning into a post-incident explanation exercise.
Most teams are still building visibility into their AI supply chain, and that is the right place to start. we45 helps organizations map AI dependencies, identify where trust is assumed instead of enforced, and design security controls that hold up as systems evolve. This is how you keep AI from becoming the most expensive blind spot in the organization.
The biggest security blind spot comes from treating the model as the entire system. In reality, the production environment runs a layered AI supply chain where data, orchestration, retrieval, tools, and permissions all shape outcomes. The true risk lies across this entire chain, which feeds, grounds, and drives the model, often more than the model's weights do.
Risk drivers include model behavior shifting across versions or vendor updates without code changes in your stack, data exposure risk via prompts and context windows when sensitive content is injected, and the failure of guardrails placed only at the model boundary to catch risks introduced earlier in the chain (e.g., retrieval).
RAG introduces a second input channel that bypasses many traditional application security assumptions because content enters the system as knowledge rather than user input. Risks arise from retrieval relevance, where weak filtering or chunking can pull wrong content into the model context, and from multi-tenant vector stores that can create cross-tenant exposure risk.
Agents raise the stakes by turning a "bad answer" into a "bad action" because they execute state changes through tool calls (e.g., modifying access, sending messages, creating tickets). This amplifies risk, makes failures look like normal operations (blending abuse into legitimate logs), and allows abuse to scale faster than human oversight, making unintended intent drift possible.
Yes. A secure model can still produce unsafe recommendations if the retrieval data is poisoned or contains malicious/misleading content. Since RAG systems treat retrieved text as high-priority context, poisoned data can insert instructions that override policy constraints, leading the model to follow the best available context, even if it is unsafe.
Traditional security is built on assumptions that LLM systems violate: inputs are bounded, code paths are reviewable, and behavior is predictable. LLM systems have unbounded, multi-channel inputs, probabilistic outputs that vary across runs, and execution logic that shifts into prompts/system messages, bypassing standard static analysis and code review processes.
The core failure is a process assumption of stability. One-time reviews fail because AI systems are dynamic; their behavior continuously changes post-approval due to model updates, continuous data ingestion, prompt tuning, and expanding tool chains. Traditional reviews also miss security-critical artifacts like prompts, retrieval logic, and dynamic runtime decision paths.
The recommended approach is to treat AI as critical infrastructure, recognizing that AI risk is systemic and lives in the connections between all components (models, data, retrieval, tools). This involves assuming behavior will change after deployment, pushing for continuous, system-aware threat modeling, and enforcing explicit trust boundaries and human control points where autonomy creates real risk.