RAG systems are leaking sensitive data… and most teams have no idea it’s happening.
Yes, the problem is the output of the LLM, but it’s also the entire RAG pipeline: how you build embeddings, what your retrievers surface, and what your vector store contains. Sensitive internal documents, source code, and even regulated data can get pulled into responses without anyone noticing.
You might think your prompts are safe. But attackers aren’t targeting prompts. They’re exploiting weak defaults, insecure configurations, and the fact that most teams haven’t built a real testing model for these systems.
Standard LLMs are already complex to secure. But once you introduce a RAG system, you’re no longer just managing prompt input and model behavior. You’re also injecting your own data into the pipeline, and usually with sensitive, unstructured, and retrieved dynamically. That changes the threat surface entirely.
Here’s what most teams overlook when they adopt RAG.
With RAG, you augment the LLM’s responses by pulling context from your own data sources. This means anything embedded in your vector store (PDFs, internal wikis, customer records, config files) can be retrieved and shown in responses. If that data wasn’t cleaned, classified, or scoped properly before ingestion, it becomes retrievable with the right query.
We’ve seen internal emails, HR documents, and unpublished strategy decks come back in output just because they were sitting in a directory that got batch-embedded without filters.
There are three common ways data exposure happens in RAG setups:
In internal red-team assessments, we’ve triggered leaks from otherwise safe LLM apps simply by prompting them with organizational terms or jargon. Because the vector store contained embedded internal documents, the LLM confidently returned answers with excerpts containing PII, credentials, or contract clauses.
In one case, a red team member retrieved API keys that had been buried in a markdown file embedded weeks earlier. In another, a prompt about company priorities returned slides from a non-public board deck.
These were RAG-specific failures: overly permissive pipelines built without threat modeling, policy enforcement, or visibility into what was getting embedded and exposed.
Even if you’ve sandboxed the model, blocked unsafe prompts, and tested the base LLM for jailbreaking, those controls don’t touch the RAG layer. This is a different surface. It needs its own review model focused on what’s in the store, how retrieval behaves, and what data can realistically be surfaced from a casual internal query.
Most RAG pipelines look clean from the outside. One LLM, a vector DB, maybe a retriever. But once you trace the data flow end to end, the exposure points multiply quickly. Each component brings its own risks, and most security teams haven’t fully mapped what those risks look like in practice.
Here’s how the attack surface actually breaks down.
Everything starts with what you feed into the system. If you’re ingesting unfiltered internal documents, wikis, meeting notes, or support tickets, sensitive data gets embedded by default. Most teams don’t apply DLP, data classification, or redaction before indexing. That’s a direct path to exposure.
We’ve seen entire customer onboarding guides, HR spreadsheets, and config files with tokens get embedded just because they were in a shared folder someone pointed the pipeline at.
Once a document gets embedded, you lose metadata like access levels, sensitivity labels, or document owners. All that control disappears. Embeddings are just vectors. That means your retriever can pull from high-sensitivity content without knowing it’s doing so. There’s no built-in concept of confidential unless you explicitly engineer for it.
We’ve tested RAG setups with vector DBs exposed over open APIs, running with default configs, or protected by nothing more than a weak token. In several cases, staging and dev environments had indexes pointing to live production embeddings. That’s shadow data waiting to be queried.
ACLs are another weak spot. Most vector DBs don’t enforce fine-grained access control by default. If your application pulls from multiple namespaces, a simple prompt injection can trigger cross-namespace retrieval unless you’ve enforced isolation explicitly.
Out-of-the-box retrievers match based on similarity instead of intent. That means they’ll pull anything that’s semantically close to the query even if it wasn’t meant to be included. Without tight filters or context-based constraints, you end up surfacing chunks that don’t belong in the user’s scope.
We’ve seen simple prompts like summarize company goals trigger retrievals from board decks, roadmap discussions, or M&A strategy documents. No special jailbreak required.
Here are some patterns that show up again and again in red-team testing:
In one case, a red teamer was able to query a staging environment’s vector DB that had been seeded with real production data. None of the access controls were enforced because the endpoint was set to test-only.
Your LLM is just one part of the system. The real attack surface starts with your documents, moves through how you embed and store them, and ends with how your retriever serves them up. If you’re not securing each layer, the gaps stack up. And when they do, a single query can turn into a data leak.
Most red team exercises for LLMs stop at prompt testing, and that’s simply not enough. If you’re using RAG, the risk is in the architecture. You’re exposing new paths for data retrieval, transformation, and leakage that traditional AppSec reviews don’t cover.
Here’s how to run a RAG-specific security assessment that actually finds the problems before attackers or auditors do.
Start by diagramming how the RAG pipeline works. Include the embedding model, retriever, vector store, LLM, and any pre- or post-processing logic. Document where these components live (prod, staging, serverless) and how they’re accessed. This gives you the real attack surface instead of just the public-facing API.
Figure out where your input data comes from. What sources feed the vector store? Who owns them? What controls exist before ingestion? Then inspect what happens during embedding: are classification tags lost? Is PII stripped? If you’re batch-embedding large internal corpora, there’s a good chance you’re exposing documents that were never meant to be surfaced.
Use a vector extractor to inspect sample embeddings and validate what kind of content is recoverable.
Test access to the vector store directly. Check for weak or missing auth, over-permissioned access roles, or endpoints exposed in staging. Query the vector index using common terms, internal jargon, and fuzzed inputs. You’re looking for retrievable data that shouldn’t be there.
Dump sample results and compare against your data classification policy. That’s a gap if you’re pulling sensitive data without triggering an alert.
Don’t just prompt for jailbreaks. Craft queries that simulate real-world insider or lateral movement attempts. Ask vague questions that mimic curiosity or data-mining behaviors. Use prompts that reference internal project names, business units, or acronyms. These often trigger retrievals from buried document chunks that bypass traditional red team payloads.
Use diffing tools to compare outputs across queries and track how often sensitive embeddings are exposed.
Validate that your system enforces context boundaries between users. If embeddings from one session bleed into another, you have a leakage path. Push the context window to its limit. Interleave prompts from different domains and verify whether responses include data from unrelated sessions or users.
Context isolation is one of the most missed risks in early-stage RAG implementations.
Enable full logging for inputs, retrieved chunks, and LLM outputs. Don’t just look at what the user sees; also track the provenance of each response. Was a specific paragraph pulled from a restricted doc? Did a retrieval include a chunk that wasn’t supposed to be indexed?
Cross-reference output content with your document corpus. If the model is surfacing unexpected information, it’s a retrieval problem and not a prompt issue.
Use targeted tools at each layer of the RAG stack to uncover misconfigurations, weak policies, and silent leakage paths. Here’s what your assessment toolkit should include:
Delaying your RAG deployment is not the answer. You need to ship it with controls that actually reduce risk. They’re the baseline if you’re planning to expose RAG to internal teams or external users.
Here’s what needs to be fixed, enforced, and continuously validated before you launch.
Every document going into your vector store should be scrubbed for sensitive content and tagged with classification metadata. That includes PII, credentials, secrets, internal project details, and anything regulated. Don’t batch-embed without inspection, else, you’re as good as seeding future leaks.
Retrievers should never pull from unrestricted or unfiltered indexes. Apply namespace boundaries and enforce them strictly. If your retriever can query across multiple datasets, scope them by role, purpose, or use case.
If users can query a retriever, that doesn’t mean they should see every chunk in the store. Embedding-level access control matters just as much as LLM prompt validation.
You need full visibility into what gets retrieved, instead of just what the LLM outputs. If a sensitive chunk was returned by the retriever, that’s a data exposure event even if the final response masked it.
Treat your RAG pipeline like a high-risk component. Threat model it. Validate it. Monitor it. Don’t leave it out of your AI governance strategy.
A single pre-launch review won’t hold up six months later. RAG pipelines evolve. Your controls need to keep up.
Getting this right doesn’t mean locking everything down. It means putting controls where the real risks live. If your vector store is exposed, your prompts are irrelevant. And if your retriever behavior isn’t auditable, you’ll never know what leaked until someone shows you a screenshot.
RAG pipelines are core infrastructure with live access to your data, and they’re already being exploited in environments that assumed LLM security was enough. Most teams are still focused on prompt safety, while embedding and retriever logic remain unaudited, unfiltered, and unmonitored.
Relying on RAG to power internal tools, customer-facing features, or product intelligence means that you’re now operating a data delivery system that needs policies, enforcement, monitoring, and real tests.
In the next 12 to 18 months, expect tighter regulation around how enterprise AI systems handle retrieval, classification, and user context. Audit logs, retriever scope, and embedding boundaries will become standard parts of risk reviews.
RAG is not the future. It’s already in production. And not testing it is as good as accepting that you have blind spots you’re not willing to fix.
But what if your RAG system is really exposed?
we45’s RAG System Security Assessment gives you a full-stack, adversarial review of your implementation that covers data ingestion, vector stores, retriever logic, and LLM outputs. We test your system the way attackers will:
You’ll get a clear, actionable report mapped to OWASP LLM, NIST AI RMF, and real-world risk without any black-box scans.
A Retrieval-Augmented Generation (RAG) system combines a large language model (LLM) with a retriever and a vector store that injects custom data into the model’s responses. The risk comes from that injected data. If sensitive documents are embedded without proper controls, the retriever can surface private, regulated, or confidential information through simple queries.
Yes. A secure LLM with strong prompt filtering can still leak sensitive data if the RAG pipeline is poorly configured. Leakage often occurs through exposed vector stores, weak retriever logic, or embedded documents that contain sensitive content. The base model doesn’t filter the data it’s fed through the retriever.
Attackers typically exploit RAG pipelines using prompt injections, broad or vague queries, or direct API access to retrievers or vector stores. They may bypass authentication, trigger retrieval from hidden namespaces, or extract sensitive chunks using carefully crafted queries that match embedded content.
Some of the most common RAG security issues include: Embedding sensitive data without redaction or classification Vector databases exposed via open endpoints or weak access controls Retriever logic that matches too broadly or lacks context limits No access enforcement between users and embeddings No logging of retriever queries or responses for audit trails
You should assess your RAG architecture by: Mapping the full pipeline, including retriever and vector DB interactions Validating what gets embedded and whether it contains PII or IP Probing retriever endpoints for unauthorized access Running adversarial prompts to simulate real-world data extraction Verifying that context windows are scoped and isolated Logging all retrievals and mapping responses to source data This is more effective than just testing prompts.
Before going live, you should: Ingest only sanitized and classification-tagged data Apply namespace-level restrictions on retrievers Enforce role-based access to vector stores Detect and block PII at embedding time Monitor all retriever outputs and log them by user Include RAG pipelines in your threat modeling and governance reviews Align your controls with OWASP LLM and NIST AI RMF standards
There is no single standard yet, but two widely used frameworks are: OWASP Top 10 for LLMs: Includes risks like data leakage via prompt injection or model misuse. NIST AI Risk Management Framework (AI RMF): Covers governance, data mapping, measurement, and risk controls for AI systems, including RAG. Aligning your RAG system with these gives you defensible and structured risk management.
If your RAG system is exposed to users (internal or external) and you skip proper controls, you risk: Data leaks from internal documents Violations of GDPR, HIPAA, or other compliance standards Shadow data exposure from dev or staging environments Reputational damage from uncontrolled model outputs Lack of auditability during incident response or compliance reviews
Yes. A specialized adversarial assessment will go beyond prompt testing to evaluate how your retriever, vector store, embedding logic, and LLM output interact. Red teams can simulate insider threats, prompt injections, and data-mining attacks to expose weak points in your RAG design.
we45 offers a dedicated RAG System Security Assessment. It’s built for enterprise teams who need to validate their entire AI retrieval pipeline against real-world attacks. The review maps to OWASP, NIST, and threat modeling practices — and delivers clear findings with remediation guidance.