RAG Systems are Leaking Sensitive Data

PUBLISHED:

October 16, 2025

BY:

Abhay Bhargav

RAG systems are leaking sensitive data… and most teams have no idea it’s happening.

Yes, the problem is the output of the LLM, but it’s also the entire RAG pipeline: how you build embeddings, what your retrievers surface, and what your vector store contains. Sensitive internal documents, source code, and even regulated data can get pulled into responses without anyone noticing.

You might think your prompts are safe. But attackers aren’t targeting prompts. They’re exploiting weak defaults, insecure configurations, and the fact that most teams haven’t built a real testing model for these systems.

RAG systems introduce new leakage risks
Your RAG pipeline has more attack surface than you think
How to properly assess the security of a RAG system
What to fix before your RAG system goes live
RAG as an attack surface

‍

RAG systems introduce new leakage risks

Standard LLMs are already complex to secure. But once you introduce a RAG system, you’re no longer just managing prompt input and model behavior. You’re also injecting your own data into the pipeline, and usually with sensitive, unstructured, and retrieved dynamically. That changes the threat surface entirely.

Here’s what most teams overlook when they adopt RAG.

You’re feeding the model your own internal data

With RAG, you augment the LLM’s responses by pulling context from your own data sources. This means anything embedded in your vector store (PDFs, internal wikis, customer records, config files) can be retrieved and shown in responses. If that data wasn’t cleaned, classified, or scoped properly before ingestion, it becomes retrievable with the right query.

We’ve seen internal emails, HR documents, and unpublished strategy decks come back in output just because they were sitting in a directory that got batch-embedded without filters.

Leak paths start with how you build the pipeline

There are three common ways data exposure happens in RAG setups:

Overexposed embeddings: Teams embed massive document corpora without proper redaction or classification. PII, credentials, and sensitive business data end up indexed and retrievable without ever passing through traditional DLP or approval gates.
Permissive retrievers: Most retriever logic is too broad. Instead of narrowing results to verified and relevant chunks, the system pulls anything loosely related to the query. This increases the risk of irrelevant or sensitive data being shown just because it matched a vector similarity threshold.
Unsecured vector stores: If your vector index is exposed via an open endpoint or lacks proper access controls, it’s a data breach waiting to happen. Attackers don’t need to attack the LLM itself, just the store behind it.

It only takes one query to surface sensitive data

In internal red-team assessments, we’ve triggered leaks from otherwise safe LLM apps simply by prompting them with organizational terms or jargon. Because the vector store contained embedded internal documents, the LLM confidently returned answers with excerpts containing PII, credentials, or contract clauses.

In one case, a red team member retrieved API keys that had been buried in a markdown file embedded weeks earlier. In another, a prompt about company priorities returned slides from a non-public board deck.

These were RAG-specific failures: overly permissive pipelines built without threat modeling, policy enforcement, or visibility into what was getting embedded and exposed.

Even if you’ve sandboxed the model, blocked unsafe prompts, and tested the base LLM for jailbreaking, those controls don’t touch the RAG layer. This is a different surface. It needs its own review model focused on what’s in the store, how retrieval behaves, and what data can realistically be surfaced from a casual internal query.

‍

Your RAG pipeline has more attack surface than you think

Most RAG pipelines look clean from the outside. One LLM, a vector DB, maybe a retriever. But once you trace the data flow end to end, the exposure points multiply quickly. Each component brings its own risks, and most security teams haven’t fully mapped what those risks look like in practice.

Here’s how the attack surface actually breaks down.

Data sources are where the leaks begin

Everything starts with what you feed into the system. If you’re ingesting unfiltered internal documents, wikis, meeting notes, or support tickets, sensitive data gets embedded by default. Most teams don’t apply DLP, data classification, or redaction before indexing. That’s a direct path to exposure.

We’ve seen entire customer onboarding guides, HR spreadsheets, and config files with tokens get embedded just because they were in a shared folder someone pointed the pipeline at.

Embedding models strip context and classification

Once a document gets embedded, you lose metadata like access levels, sensitivity labels, or document owners. All that control disappears. Embeddings are just vectors. That means your retriever can pull from high-sensitivity content without knowing it’s doing so. There’s no built-in concept of confidential unless you explicitly engineer for it.

Vector databases are often misconfigured

We’ve tested RAG setups with vector DBs exposed over open APIs, running with default configs, or protected by nothing more than a weak token. In several cases, staging and dev environments had indexes pointing to live production embeddings. That’s shadow data waiting to be queried.

ACLs are another weak spot. Most vector DBs don’t enforce fine-grained access control by default. If your application pulls from multiple namespaces, a simple prompt injection can trigger cross-namespace retrieval unless you’ve enforced isolation explicitly.

Retriever logic is overly broad by default

Out-of-the-box retrievers match based on similarity instead of intent. That means they’ll pull anything that’s semantically close to the query even if it wasn’t meant to be included. Without tight filters or context-based constraints, you end up surfacing chunks that don’t belong in the user’s scope.

We’ve seen simple prompts like summarize company goals trigger retrievals from board decks, roadmap discussions, or M&A strategy documents. No special jailbreak required.

Real-world attack paths are simple and effective

Here are some patterns that show up again and again in red-team testing:

Prompt injections that trick the retriever into pulling from hidden or sensitive namespaces.
Queries crafted to match rare embeddings and surface data that doesn’t normally appear in chat responses.
Access control bypasses caused by retrievers operating outside the app’s auth model.
Shadow indexes exposed from dev environments where sensitive embeddings were never meant to be live.

In one case, a red teamer was able to query a staging environment’s vector DB that had been seeded with real production data. None of the access controls were enforced because the endpoint was set to test-only.

Your LLM is just one part of the system. The real attack surface starts with your documents, moves through how you embed and store them, and ends with how your retriever serves them up. If you’re not securing each layer, the gaps stack up. And when they do, a single query can turn into a data leak.

‍

How to properly assess the security of a RAG system

Most red team exercises for LLMs stop at prompt testing, and that’s simply not enough. If you’re using RAG, the risk is in the architecture. You’re exposing new paths for data retrieval, transformation, and leakage that traditional AppSec reviews don’t cover.

Here’s how to run a RAG-specific security assessment that actually finds the problems before attackers or auditors do.

1. Map the architecture end to end

Start by diagramming how the RAG pipeline works. Include the embedding model, retriever, vector store, LLM, and any pre- or post-processing logic. Document where these components live (prod, staging, serverless) and how they’re accessed. This gives you the real attack surface instead of just the public-facing API.

2. Trace data lineage from source to embedding

Figure out where your input data comes from. What sources feed the vector store? Who owns them? What controls exist before ingestion? Then inspect what happens during embedding: are classification tags lost? Is PII stripped? If you’re batch-embedding large internal corpora, there’s a good chance you’re exposing documents that were never meant to be surfaced.

Use a vector extractor to inspect sample embeddings and validate what kind of content is recoverable.

3. Probe vector DB access for leaks and misconfigurations

Test access to the vector store directly. Check for weak or missing auth, over-permissioned access roles, or endpoints exposed in staging. Query the vector index using common terms, internal jargon, and fuzzed inputs. You’re looking for retrievable data that shouldn’t be there.

Dump sample results and compare against your data classification policy. That’s a gap if you’re pulling sensitive data without triggering an alert.

4. Run adversarial prompts with targeted intent

Don’t just prompt for jailbreaks. Craft queries that simulate real-world insider or lateral movement attempts. Ask vague questions that mimic curiosity or data-mining behaviors. Use prompts that reference internal project names, business units, or acronyms. These often trigger retrievals from buried document chunks that bypass traditional red team payloads.

Use diffing tools to compare outputs across queries and track how often sensitive embeddings are exposed.

5. Test context window boundaries for user isolation

Validate that your system enforces context boundaries between users. If embeddings from one session bleed into another, you have a leakage path. Push the context window to its limit. Interleave prompts from different domains and verify whether responses include data from unrelated sessions or users.

Context isolation is one of the most missed risks in early-stage RAG implementations.

6. Log everything and analyze the output trail

Enable full logging for inputs, retrieved chunks, and LLM outputs. Don’t just look at what the user sees; also track the provenance of each response. Was a specific paragraph pulled from a restricted doc? Did a retrieval include a chunk that wasn’t supposed to be indexed?

Cross-reference output content with your document corpus. If the model is surfacing unexpected information, it’s a retrieval problem and not a prompt issue.

‍

Tools to make this work

Use targeted tools at each layer of the RAG stack to uncover misconfigurations, weak policies, and silent leakage paths. Here’s what your assessment toolkit should include:

Embedding extractors: Extract and inspect vector embeddings to validate what content is actually being indexed. Helps identify whether PII, credentials, or confidential business logic are encoded and retrievable.
Vector index dumpers: Dump the contents of your vector store to audit embedded data directly. Useful for identifying improperly scoped or over-broad document ingestion pipelines.
Retriever fuzzers: Send malformed, adversarial, and low-confidence queries to your retriever logic. Used to test whether broad matching surfaces unintended content or bypasses retrieval filters.
Access control testers: Simulate unauthenticated or mis-scoped access to your vector DB endpoints. Look for open APIs, weak token enforcement, over-permissioned roles, and shadow staging environments with prod data.
Context window bloat testers: Automate multi-session prompt testing to detect cross-user context leakage or injection bleed. Essential for checking whether embedding scope or memory controls are actually enforced.
LLM output diffing tools: Compare outputs across variations of the same query to detect sensitive drift, inconsistent retrievals, or context bleed. Helps catch subtle leakage patterns missed in single-query tests.
Data lineage trackers: Trace data flow from ingestion source to embedded chunk to retrieved output. Validate whether classification metadata, sensitivity labels, or auth scopes are preserved (or lost) during each step.
Audit log analyzers: Parse and review logs from each pipeline component, such as retrievers, vector DB, and LLM. Correlate queries, retrieved chunks, and final outputs to understand how and why a sensitive response was generated.

‍

What to fix before your RAG system goes live

Delaying your RAG deployment is not the answer. You need to ship it with controls that actually reduce risk. They’re the baseline if you’re planning to expose RAG to internal teams or external users.

Here’s what needs to be fixed, enforced, and continuously validated before you launch.

Sanitize and classify before ingestion

Every document going into your vector store should be scrubbed for sensitive content and tagged with classification metadata. That includes PII, credentials, secrets, internal project details, and anything regulated. Don’t batch-embed without inspection, else, you’re as good as seeding future leaks.

Strip or redact sensitive fields before embedding.
Require classification labels in metadata or doc headers.
Use automated PII and secret detection as a gate before ingestion.

Control what retrievers are allowed to access

Retrievers should never pull from unrestricted or unfiltered indexes. Apply namespace boundaries and enforce them strictly. If your retriever can query across multiple datasets, scope them by role, purpose, or use case.

Enforce per-user and per-query namespace restrictions.
Use access control lists to tie retrievers to specific indices.
Prevent cross-tenant or cross-domain queries by default.

Gate embeddings with role-based access controls

If users can query a retriever, that doesn’t mean they should see every chunk in the store. Embedding-level access control matters just as much as LLM prompt validation.

Set query permissions by user role and embedding namespace.
Block queries that return unauthorized content, even if vector similarity is high.
Map embedding visibility to your existing data access policies.

Monitor and log retriever activity

You need full visibility into what gets retrieved, instead of just what the LLM outputs. If a sensitive chunk was returned by the retriever, that’s a data exposure event even if the final response masked it.

Log every retriever query and the chunks returned.
Link retriever outputs to user identity and session context.
Analyze logs for drift, overexposure, or policy bypass attempts.

Apply governance and link to real frameworks

Treat your RAG pipeline like a high-risk component. Threat model it. Validate it. Monitor it. Don’t leave it out of your AI governance strategy.

Build RAG-specific threat scenarios into your design reviews.
Map controls to NIST AI RMF (e.g. data mapping, access governance, measurement).
Link findings and fixes to OWASP LLM risk categories.

Make assessments repeatable

A single pre-launch review won’t hold up six months later. RAG pipelines evolve. Your controls need to keep up.

Schedule periodic assessments of retriever behavior and embedding scope.
Build CI gates for embedding ingestion and retriever policy checks.
Track changes to vector stores and monitor for scope creep or data sprawl.

Getting this right doesn’t mean locking everything down. It means putting controls where the real risks live. If your vector store is exposed, your prompts are irrelevant. And if your retriever behavior isn’t auditable, you’ll never know what leaked until someone shows you a screenshot.

‍

RAG as an attack surface

RAG pipelines are core infrastructure with live access to your data, and they’re already being exploited in environments that assumed LLM security was enough. Most teams are still focused on prompt safety, while embedding and retriever logic remain unaudited, unfiltered, and unmonitored.

Relying on RAG to power internal tools, customer-facing features, or product intelligence means that you’re now operating a data delivery system that needs policies, enforcement, monitoring, and real tests.

In the next 12 to 18 months, expect tighter regulation around how enterprise AI systems handle retrieval, classification, and user context. Audit logs, retriever scope, and embedding boundaries will become standard parts of risk reviews.

RAG is not the future. It’s already in production. And not testing it is as good as accepting that you have blind spots you’re not willing to fix.

But what if your RAG system is really exposed?

we45’s RAG System Security Assessment gives you a full-stack, adversarial review of your implementation that covers data ingestion, vector stores, retriever logic, and LLM outputs. We test your system the way attackers will:

Can sensitive embeddings be retrieved without access?
Can prompt injections pull from hidden namespaces?
Are retrievers surfacing data they shouldn’t, and are you logging it?

You’ll get a clear, actionable report mapped to OWASP LLM, NIST AI RMF, and real-world risk without any black-box scans.

FAQ

What is a RAG system and why does it pose a security risk?

A Retrieval-Augmented Generation (RAG) system combines a large language model (LLM) with a retriever and a vector store that injects custom data into the model’s responses. The risk comes from that injected data. If sensitive documents are embedded without proper controls, the retriever can surface private, regulated, or confidential information through simple queries.

Can LLMs leak sensitive data through RAG even if the base model is secure?

Yes. A secure LLM with strong prompt filtering can still leak sensitive data if the RAG pipeline is poorly configured. Leakage often occurs through exposed vector stores, weak retriever logic, or embedded documents that contain sensitive content. The base model doesn’t filter the data it’s fed through the retriever.

How do attackers exploit RAG systems?

Attackers typically exploit RAG pipelines using prompt injections, broad or vague queries, or direct API access to retrievers or vector stores. They may bypass authentication, trigger retrieval from hidden namespaces, or extract sensitive chunks using carefully crafted queries that match embedded content.

What are common RAG pipeline vulnerabilities?

Some of the most common RAG security issues include: Embedding sensitive data without redaction or classification Vector databases exposed via open endpoints or weak access controls Retriever logic that matches too broadly or lacks context limits No access enforcement between users and embeddings No logging of retriever queries or responses for audit trails

How can I test the security of my RAG system?

You should assess your RAG architecture by: Mapping the full pipeline, including retriever and vector DB interactions Validating what gets embedded and whether it contains PII or IP Probing retriever endpoints for unauthorized access Running adversarial prompts to simulate real-world data extraction Verifying that context windows are scoped and isolated Logging all retrievals and mapping responses to source data This is more effective than just testing prompts.

What controls should be in place before deploying a RAG system?

Before going live, you should: Ingest only sanitized and classification-tagged data Apply namespace-level restrictions on retrievers Enforce role-based access to vector stores Detect and block PII at embedding time Monitor all retriever outputs and log them by user Include RAG pipelines in your threat modeling and governance reviews Align your controls with OWASP LLM and NIST AI RMF standards

Is there a standard for securing RAG systems?

There is no single standard yet, but two widely used frameworks are: OWASP Top 10 for LLMs: Includes risks like data leakage via prompt injection or model misuse. NIST AI Risk Management Framework (AI RMF): Covers governance, data mapping, measurement, and risk controls for AI systems, including RAG. Aligning your RAG system with these gives you defensible and structured risk management.

What happens if I ignore RAG security risks?

If your RAG system is exposed to users (internal or external) and you skip proper controls, you risk: Data leaks from internal documents Violations of GDPR, HIPAA, or other compliance standards Shadow data exposure from dev or staging environments Reputational damage from uncontrolled model outputs Lack of auditability during incident response or compliance reviews

Can a red team assess my RAG system?

Yes. A specialized adversarial assessment will go beyond prompt testing to evaluate how your retriever, vector store, embedding logic, and LLM output interact. Red teams can simulate insider threats, prompt injections, and data-mining attacks to expose weak points in your RAG design.

Where can I get help running a secure RAG assessment?

we45 offers a dedicated RAG System Security Assessment. It’s built for enterprise teams who need to validate their entire AI retrieval pipeline against real-world attacks. The review maps to OWASP, NIST, and threat modeling practices — and delivers clear findings with remediation guidance.

Abhay Bhargav

Abhay builds AI-native infrastructure for security teams operating at modern scale. His work blends offensive security, applied machine learning, and cloud-native systems focused on solving the real-world gaps that legacy tools ignore. With over a decade of experience across red teaming, threat modeling, detection engineering, and ML deployment, Abhay has helped high-growth startups and engineering teams build security that actually works in production, not just on paper.