Three Documents Were Enough: A RAG Poisoning Attack With a 95% Success Rate

Security researcher Amine Raji needed just three documents and a MacBook Pro to make an enterprise AI system fabricate its company's finances.

In a proof-of-concept published this week, Raji showed that a RAG system running entirely locally, with no GPU or API access, could be manipulated into reporting false revenue figures by injecting three crafted documents into its vector database. The target was a Qwen2.5-7B-Instruct model that, once poisoned, reported Q4 2025 revenue as $8.3 million (down 47% year-over-year) against the real figure of $24.7 million with a $6.5 million profit. Across 20 independent runs, the attack succeeded 19 times.

No adversarial payloads, no exploits. The technique required nothing beyond vocabulary engineering: load the injected documents with the same domain-specific financial terminology as legitimate sources so they outrank real content on cosine similarity at retrieval time, then dress them in corporate authority language ("CFO-Approved Correction" memos, "Emergency Board Communications") to push the model toward treating fabricated figures as ground truth. RAG was designed to reduce hallucination by grounding LLMs in verified sources. This attack inverts that assumption, using the retrieval mechanism to force a specific, controllable falsehood on demand.

The technique maps to what the PoisonedRAG paper (USENIX Security 2025) formalizes as a two-condition attack: poisoned documents must land in the top-k retrieved results, and they must carry enough authority framing to override legitimate context in the generation step. Raji's demonstration shows that satisfying both conditions requires nothing more sophisticated than careful document drafting.

For organizations running agentic pipelines, the exposure is uncomfortable. Any workflow where external parties can contribute documents to a shared knowledge base is a candidate: vendor portals, contractor file uploads, integrated third-party data feeds. In agentic systems where models autonomously retrieve context and act on it, a corrupted knowledge base can silently redirect decisions or downstream automated actions without producing obvious errors.

Raji tested five defensive layers against the attack. Embedding anomaly detection at ingestion was the most effective single control, cutting the success rate from 95% to 20%. Stacking all five layers brought it to 10%. That residual rate matters. A one-in-ten success rate against a fully-defended system is not a comfortable margin for financial reporting, and it illustrates how difficult document-layer attacks are to fully suppress once an adversary has write access to the knowledge base.

The demonstration is the fourth in Raji's open-source MCP Attack Labs repository, which maps the attack surface of modern agentic deployments across MCP tool poisoning, Docker supply-chain prompt injection, automated red-teaming, and agentic memory attacks. The full code, including attack and defense implementations with a Jupyter playbook, is publicly reproducible using LM Studio, Python, and ChromaDB.

RAG has become the dominant architecture for grounding enterprise LLMs in proprietary data, and adoption has outpaced the security practices around it. Knowledge base integrity is increasingly where the consequential attacks will land.