
If you haven't realized yet, your teams are no longer experimenting with AI. They are embedding LLM copilots into developer workflows, wiring RAG pipelines into internal knowledge systems, exposing AI-backed features to customers, and connecting third-party models to sensitive data flows.
These systems influence decisions, generate outputs that users act on, and sometimes trigger downstream automation. Treating them like conventional web services with a model API bolted on ignores how their behavior shifts with data, prompts, context windows, and integrations that were never part of your original threat models.
Prompt injection might just be the last of your worries here. It is the quiet gaps that appear when you apply familiar security controls to systems that do not behave deterministically, inherit risk from training data and plugins, and expand your trust boundaries without a corresponding update in oversight.
Typical threat models still map cleanly to a familiar pattern: user, app, APIs, data stores, and a few external integrations. But that strategy is not at all effective once you ship AI features, because the system stops being an app that calls a model and becomes a pipeline that ingests content, transforms it, retrieves context, composes prompts, calls external services, and sometimes triggers actions. Each step introduces a trust boundary, and many of those boundaries sit outside the controls your teams normally rely on.
RAG changes the way data enters the system. Instead of a handful of validated inputs, you now ingest documents, tickets, wikis, logs, chat exports, PDFs, and sometimes customer content, then you turn that mess into embeddings that drive downstream answers. That ingestion layer becomes a security boundary because it decides what the model is allowed to know at runtime.
Common failure modes show up here because the ingestion path often has weaker controls than application APIs:
Threat modeling needs to treat ingestion like a first-class interface: define what sources are allowed, what trust level each source carries, what validation happens before indexing, and what evidence you keep for traceability.
Vector stores are not just another database. In many AI architectures, they control what context gets injected into prompts, which means they directly shape model behavior. That makes them attractive to attackers aiming for data theft, output manipulation, or stealthy influence over decision-making.
A thorough model covers vector-specific risks that traditional app models miss:
Security controls also need to match how vector systems operate. Encryption at rest helps, but it does not address retrieval abuse, write-path integrity, or whether the application can prove which documents contributed to an answer.
Many teams threat model the model call as a single request-response step. Real systems rarely work that way. They chain prompts across steps like query rewriting, intent routing, tool selection, context expansion, policy checks, and summarization. Each step passes intermediate text that can carry instructions, data, and hidden assumptions forward.
That chaining creates security-relevant behavior that needs explicit modeling:
At threat-model time, treat prompt chains like a workflow graph. Map the steps, define what data is permitted at each step, define which step is allowed to call tools, and define what validation must happen before tool execution.
External model providers expand the attack surface in ways that standard third-party risk questionnaires tend to flatten. The main issue is your runtime behavior still depending on a remote service you do not control, receiving sensitive context that you assembled internally.
Threat modeling should cover:
This layer also affects incident response. You need a plan for evidence collection and timeline reconstruction that does not depend on hoping the provider can answer questions quickly when something goes wrong.
A lot of AI threat models still look like a standard web app diagram with an LLM box attached to the side. It misses the parts where risk actually concentrates: the embedding pipeline, the retrieval logic that selects context, and the model interaction layer where prompts, policies, and tool execution meet.
A more realistic AI threat model explicitly captures:
When you model those layers directly, you stop treating AI risk as model risk and start treating it as system risk. And that's where the blind spots show up, and it’s where the fixes become actionable.
AI data leakage rarely looks like exfiltration from a database, and that’s why it slips through mature programs. The system can stay up, your access logs can look normal, and nobody has to bypass a network control, yet sensitive information still shows up in an answer, a completion, a summary, or even in the pattern of what the model refuses to say.
Security teams tend to anchor on the classic breach story because that’s how most controls and detections are designed, but model behavior creates leakage paths that behave more like abuse of logic and context than theft of rows.
Large models can reproduce snippets of training data under certain conditions, especially when fine-tuned carelessly, trained on internal corpora without scrubbing, or evaluated without strong leakage testing. This does not require an attacker to get into anything, it only requires them to find prompts that trigger recall, sometimes through repeated probing.
The practical problem is that memorization tends to surface as plausible output, which means a user can copy sensitive content without realizing it came from a protected source, and the system might not leave a clean audit trail that ties the output back to a specific document.
Teams should treat the training and tuning pipeline as part of the data lifecycle, with controls that match the sensitivity of the source data:
In a RAG architecture, retrieval decides what the model sees, and what the model sees is what it can leak. Attackers do not need direct access to the underlying document store when they can manipulate retrieval through crafted queries, query rewriting behavior, or repeated probing that gradually pulls sensitive fragments into the context window.
Once sensitive text lands in context, the model can reproduce it verbatim, summarize it, translate it, or transform it into a different representation that still reveals the underlying data.
RAG leakage often comes from gaps in retrieval controls rather than the model itself:
A key point that keeps biting teams is that access control enforced at the UI layer does not automatically apply at the model retrieval layer. The UI can hide a document, the retrieval pipeline can still fetch chunks from it unless you enforce the same authorization decisions inside retrieval, with the same identity, the same entitlements, and the same auditability.
Connectors and sync jobs tend to get approved as just integrations, then they become the largest expansion of sensitive surface area in the entire system. The failure mode is in its convenience: a connector gets read access to a broad Confluence space, a shared drive, a ticketing project, or an email archive, and the retrieval layer treats all ingested content as fair game.
The control set needs to be connector-specific, because connectors behave like privileged service accounts:
Models can leak through what they reveal indirectly. Users can infer sensitive attributes from consistent behaviors: differences in refusal patterns, confidence levels, response latency, or subtle variations in phrasing after repeated queries. This shows up as membership inference (whether a record or document exists in the corpus), attribute inference (what can be deduced about a subject), and reconstruction attempts where an attacker uses iterative prompts to recover sensitive content in fragments.
Mitigations are practical, but they require acknowledging that output is a security boundary:
Teams often end up with models trained or fine-tuned outside approved pipelines, sometimes by a product group trying to move fast, sometimes by a business unit solving a local problem. This is where internal data tends to get pulled into training datasets with weak governance, then the model gets deployed behind a lightweight UI and treated as low risk because it is internal.
That combination creates exposure: sensitive data goes into training, the model becomes a new distribution channel, and logging, retention, and access control rarely meet enterprise standards.
Shadow AI risk shows up in predictable ways:
A lot of programs still treat the database as the relevant boundary, then they assume everything upstream inherits those controls. AI systems break that assumption because representations of data and the paths that fetch it matter just as much as the source.
Data classification and access control need to extend into:
When teams treat embeddings, retrieval, and outputs as security boundaries with explicit controls, leakage becomes something you can prevent and detect, rather than something you explain after screenshots show up in a support ticket.
Security teams still instinctively look for an exploit chain in the application, because that’s where the tooling and muscle memory lives. Prompt injection breaks that pattern. The attacker does not need RCE, SQLi, or a deserialization bug, they just need a way to get text into the model’s working context and enough leverage over how the system interprets it. Once that happens, the model can be steered into ignoring policy, revealing restricted data, calling tools in unsafe ways, or producing outputs that trigger business workflows you never intended to expose.
This is also where people get burned by false confidence in SAST and DAST. Those tools do valuable work, but they are built to reason about code paths, inputs, and sink functions in deterministic software. Prompt injection lives in the behavior layer: how system instructions, retrieved content, user prompts, tool outputs, and memory interact at runtime. A clean scan does not mean a safe assistant.
Every LLM app has competing instruction sources. You have system prompts, developer prompts, user prompts, retrieved context, tool outputs, and sometimes a memory layer. Prompt injection works by crafting content that causes the model to treat untrusted text as higher priority guidance, or to reinterpret the task boundaries in a way that benefits the attacker.
This typically succeeds when the system relies on how the model will follow the system prompt as the control, instead of enforcing policy outside the model. Common design gaps include:
A practical threat model treats all non-system content as untrusted, then assumes an attacker will actively try to create instruction conflict.
Indirect injection is the version that shows up in real enterprises because the attacker does not need to talk to the model directly. They place malicious instructions inside content that your system later retrieves as relevant context, such as a wiki page, a support ticket, a shared document, a code comment, a PDF, or even a web page that gets ingested through a connector.
Once that content is retrieved, it sits beside legitimate business context, and the model has to decide what to follow. That creates a path to abuse in systems that do any of the following:
Remember, a single malicious sentence in a doc can become a control bypass once it is pulled into context, especially when the assistant has the ability to retrieve broadly and act.
Role confusion shows up when the system is not explicit, in code and in architecture, about which messages are authoritative and which are untrusted content. In practice, that happens when prompts are built as a single text blob, when tool outputs are inserted without strict structure, or when memory and chat history are treated as equally valid sources of instruction.
Attackers exploit this by crafting text that impersonates higher-privilege roles or creates fake boundaries, for example by claiming to be system instructions, security policies, or admin guidance. The model may not literally believe it, but it can still follow it because the content is highly directive and close to the immediate task. Without strict separation and validation, the system ends up with policy that is easy to socially engineer.
Even without tool calling, LLM output can become an execution primitive the moment another system treats it as authoritative. This is where prompt injection turns into business impact, because the attacker can shape outputs that trigger workflow steps, approvals, exceptions, refunds, access grants, or customer-facing decisions.
This tends to surface in patterns like these:
The issue is not the model making a mistake, the system giving the model output a level of authority that was never designed for adversarial conditions.
Prompt injection vulnerabilities rarely show up as a vulnerable function call or a missing output encoding step, because the weakness lives in how the system composes context and how it authorizes actions. That is why traditional scanning can report that everything is all clear while the assistant still leaks data, bypasses policy, or triggers unsafe workflows through nothing more than text.
Behavior-level adversarial testing focuses on what matters in AI systems: whether an attacker can steer outcomes. A serious test plan includes:
Secure AI systems by testing behavior under adversarial inputs, because that is where the real attack surface lives, and it is where your existing code-level scanning tools have no visibility.
Just because your provider have a good reputation, doesn't mean nothing can go wrong. It helps, but it does not make the enterprise problem go away, because you still send sensitive context over an API, you still depend on vendor-controlled behavior at runtime, and you still own the outcomes when the system leaks data, makes a bad decision, or violates a regulatory requirement.
Hosted models change the shape of the risk, and that change is easy to underestimate because it sits in procurement language, API configuration, and operational drift rather than in a classic vulnerability report.
With third-party AI, your prompts and outputs become data that exists outside your environment, even when the provider says they secure it. First and foremost, ask these questions: what do they keep, for how long, and for what purpose. Because retention directly affects breach impact, eDiscovery exposure, regulatory scope, and internal privacy commitments.
The security and legal reality tends to come down to details that teams gloss over during onboarding:
Treat this like a data-sharing agreement, because that is what it becomes in practice once you send production context into the model.
Hosted models are still accessed through software interfaces, and those interfaces are easy to misconfigure in ways that leak data or weaken controls. Teams routinely ship with overly broad keys, weak network boundaries, permissive scopes, and missing runtime constraints because the integration feels like just another SaaS API.
The failure patterns you want to threat model and test look like this:
This is the part that frustrates incident response teams, because the logs often show valid API usage while the actual behavior is clearly unsafe.
Even when the provider’s core service is sound, logging around the service can leak. Many orgs log full prompts and outputs for debugging, evaluation, or product analytics, then route those logs into shared SIEMs, APM tools, ticketing systems, and vendor support cases. The result is sensitive content replicated into places with weaker access controls and longer retention than the primary data store.
Controls need to cover both sides of the boundary:
Hosted model providers ship updates. Sometimes those updates improve quality or safety. Sometimes they change refusal behavior, prompt adherence, tool selection, or output formatting in ways that break downstream guardrails. A model that behaved predictably last quarter can behave differently after an update, even when your code is unchanged, because the model itself is a moving dependency.
The security implication is behavior drift, and it shows up in places that matter:
Behavior drift is an operational risk, and you manage it the same way you manage other moving dependencies: testing, monitoring, and controlled rollouts.
Most enterprises do not get deep transparency into how a hosted model was trained, what data sources influenced it, or how updates were validated internally. That constraint does not make hosted models unusable, it just means you cannot treat the provider’s reputation as a substitute for your own assurance.
Your program needs compensating controls that focus on outcomes you can measure:
Third-party AI can be a solid choice, and it still demands discipline that matches the risk.
You want three things in place:
Hosted model adoption works when you treat the provider as part of your production security boundary, because that is exactly what it becomes the moment sensitive context leaves your environment.
The governance gap usually starts with basic visibility. Teams deploy copilots, internal assistants, decision-support models, and AI features inside products, then nobody maintains a single view of what exists, where it runs, what data it touches, and what external services it depends on. Without that baseline, every other control becomes inconsistent because teams are guessing about scope.
Many enterprises can point to risk frameworks, but struggle to produce a documented assessment that matches how the AI system actually works in production. Review artifacts are missing, incomplete, or detached from real architecture and real data flows, which makes them hard to defend under scrutiny.
A defensible model risk assessment captures the system’s reality, including:
Traditional systems support post-incident reconstruction because you can trace a request to a code path, a database query, and an outcome. AI systems introduce steps that are harder to reconstruct: retrieval ranking, prompt assembly, intermediate chain steps, model sampling behavior, and tool execution decisions driven by generated text.
Traceability needs to include more than application logs:
Security and compliance teams cannot govern what they cannot enumerate. AI inventory is often scattered across product teams, innovation groups, data science, and shadow projects, with inconsistent naming and inconsistent ownership. A reliable inventory is not a spreadsheet, it is a living register tied to deployment, approvals, and monitoring.
A workable AI system inventory covers:
AI systems often land between org boundaries: product owns delivery, data science owns model tuning, security owns risk, legal owns policy, procurement owns the vendor, and engineering owns runtime. When ownership is not explicit, decisions become informal, and the most important calls, like what data can be retrieved, what gets logged, what gets retained, and what is allowed to trigger actions, get made without a single accountable owner.
Clear ownership usually requires naming both:
Model updates change behavior. Fine-tuning can increase memorization risk, shift refusal behavior, and alter how the system responds to sensitive prompts. Retrieval updates can change what the model sees, which changes what it can leak. Many teams treat these updates like internal experimentation, then push changes with limited change control because they think it’s just a model update.
A solid audit trail captures:
Some AI use cases trigger expectations for transparency, human oversight, and explainability, especially when decisions affect customers, employees, credit, healthcare, insurance, or access to services. Even when regulation does not explicitly mandate explainability for a given deployment, internal audit and external stakeholders often expect the organization to justify how decisions are made, how bias is managed, and how errors are detected and corrected.
This is where governance has to move beyond having a policy and into showing the actual evidence.
AI security tends to fail at the moment it gets scoped to review the feature, ship the feature, and move on. That framing works for a static endpoint or a UI change, but AI-enabled capabilities behave like systems with their own lifecycle, dependencies, and operational failure modes. The model can change, the retrieval corpus can change, the prompt chain can change, and the surrounding product can change how outputs get consumed.
A one-time assessment can tell you the architecture looked reasonable on a given day, but it cannot tell you whether the system stays safe as it evolves through retraining, connector expansions, vendor updates, routing changes, and new data sources.
The lifecycle for an AI system includes data sourcing, ingestion, embedding generation, retrieval tuning, prompt and policy updates, tool integration changes, and operational adjustments made to improve quality or reduce cost. Security needs coverage across those lifecycle touchpoints because each one can introduce new leakage paths, new manipulation opportunities, and new compliance scope.
The lifecycle elements that deserve explicit security control look like this:
Teams retrain to improve quality, expand coverage, or incorporate new data. They also update retrieval corpora continuously through sync jobs and connectors. Both actions can change what the model can reveal and what it can be manipulated into doing, which means they need the same seriousness you apply to a production release.
The practical risks that show up after retraining or corpus expansion tend to be repeatable:
Treat retraining and corpus updates as production changes with required evidence, including pre-change evaluation, post-change monitoring, and clear rollback criteria.
Most orgs monitor availability and latency, then they assume the rest is model quality. That leaves a blind spot because behavior drift is a security issue when the model starts responding differently to the same inputs, especially around sensitive topics, access boundaries, and tool usage. Drift can come from model updates, prompt edits, retrieval tuning, ranking changes, safety policy updates, temperature changes, and even changes in the underlying data being retrieved.
Operational monitoring should include security-relevant behavioral signals, such as:
AI systems generate outputs that can carry sensitive content, and they often do it through multi-step chains that are difficult to reconstruct after the fact. Without structured logs that capture context lineage and policy decisions, teams end up unable to answer basic questions during an incident: what content was retrieved, what instructions were present, what tool calls happened, and why the system produced that output.
At a minimum, observability should support:
This has to be engineered carefully to avoid creating a second leakage channel through logs, which means redaction, access controls, and retention policies become part of the design.
Most teams do some safety testing before launch, then move on because release pressure is real. That approach misses how attackers behave and how AI systems evolve. Red teaming needs to be continuous and scoped to your architecture and business workflows, especially where the model can access sensitive data, where it can influence decisions, and where it can trigger actions.
A pragmatic red teaming program targets:
AI risk is becoming a board-level topic, and the uncomfortable reality is that most exposure today comes from systems that passed review and still behave in ways nobody fully modeled. The gap is the mindset that treats AI as an extension of existing controls instead of a system with its own trust boundaries, lifecycle, and failure modes. Leaders who adjust their mental model now will avoid explaining preventable incidents later.
The next phase of AI security will separate teams that rely on static approvals from those who build continuous validation into architecture, governance, and operations. That means treating retrieval as a permission boundary, output as a control surface, vendor models as part of your perimeter, and behavioral drift as a measurable risk. It also means aligning security, product, and legal around shared ownership before regulators and customers force that alignment under pressure.
If you want to pressure-test your AI systems beyond surface-level reviews, we45’s AI security services focus on architecture risk, adversarial testing, RAG pipeline validation, and continuous model threat modeling.
The right next step is not another checklist, but a real assessment of how your AI systems behave under stress.
Traditional security controls treat AI like a conventional web service with a model API. This ignores how AI behavior shifts based on data, prompts, context windows, and integrations, leading to quiet gaps that conventional threat models miss. The risk shifts from code paths to system behavior, retrieval logic, and prompt chaining.
Typical threat models focus on the user, app, and data stores. An AI system, however, is a pipeline that ingests, transforms, retrieves context, composes prompts, and calls external services. Incomplete threat modeling misses the new trust boundaries introduced by RAG pipelines, vector databases, prompt chains, and third-party model APIs, where risk truly concentrates.
RAG (Retrieval-Augmented Generation) pipelines change how data enters the system by ingesting a wide variety of content (wikis, logs, PDFs, chat exports) and turning it into embeddings. Common failure modes include poisoned source content, over-broad connectors pulling in out-of-scope content, weak provenance, and inconsistent chunking that can surface sensitive fields unexpectedly.
Vector stores sit at the center of the answer path, directly shaping model behavior by controlling which context is injected into prompts. This makes them attractive targets for data theft, output manipulation, and stealthy influence. Specific risks include unauthorized reads (leaking embedded content), unauthorized writes (inserting malicious chunks), and similarity search abuse for probing sensitive topics.
Real-world AI systems chain prompts across multiple steps (query rewriting, intent routing, tool selection). This creates implicit execution paths where injected instructions from retrieved content can propagate into later steps, potentially influencing tool calls or policy enforcement. Threat models must map this workflow graph to define data permissions and validation before tool execution.
AI data leakage often occurs through abuse of logic and context rather than traditional database exfiltration. The paths include: training data memorization (reproducing sensitive training snippets), retrieval abuse in RAG systems (manipulating queries to pull sensitive fragments into context), over-permissive document connectors, and output-based inference attacks (leaking information indirectly through consistent response patterns).
Retrieval abuse occurs when an attacker manipulates the retrieval process, such as through crafted queries, to pull sensitive fragments into the model’s context window. Retrieval effectively becomes the real permission boundary. Gaps like overly broad retrieval scope or weak tenant and role filtering mean a model can still fetch and leak chunks from a document, even if the UI is hiding it from the user.
Direct Prompt Injection is when an attacker crafts a malicious instruction in the user prompt itself to override the system's instructions. Indirect Prompt Injection is when an attacker places malicious instructions inside external content (like a wiki page, PDF, or shared document) that the system later retrieves as relevant context, causing the model to follow the untrusted text.
Prompt injection is a vulnerability in the behavior layer—how system instructions, user prompts, and retrieved content interact at runtime. SAST (Static Application Security Testing) and DAST (Dynamic Application Security Testing) are built to reason about code paths and functions in deterministic software. They will report a clean scan even when the system can be manipulated through text-based inputs.
A one-time security assessment for an AI feature is insufficient because AI-enabled capabilities behave like systems with their own continuous lifecycle. The model, the retrieval corpus, the prompt chain, and the vendors can all change over time. Lifecycle security demands continuous coverage over data sourcing, retraining, corpus updates, and prompt/policy configuration management, as all these can introduce new risks even when no code is shipped.