Security researcher Amine Raji, PhD, has published a reproducible demonstration showing that RAG (Retrieval-Augmented Generation) systems — the knowledge-base architecture underpinning most enterprise AI deployments — can be manipulated into reporting entirely fabricated information using nothing more than carefully worded documents and write access to the knowledge store. In his lab, published as part of the mcp-attack-labs repository, Raji injected three fabricated documents into a ChromaDB vector store and prompted a locally-running Qwen2.5-7B-Instruct model to report on company financials. The system responded with a fabricated $8.3M revenue figure and a description of workforce reductions — suppressing the legitimate $24.7M revenue document that was present in the retrieved context. The attack achieved a 95% success rate across 20 independent runs, required no GPU, no cloud infrastructure, and no exploitation of any software vulnerability. The entire setup ran on a MacBook Pro in under ten minutes.
The attack is grounded in the PoisonedRAG framework formalized by Zou et al. at USENIX Security 2025, which identifies two necessary conditions for success: poisoned documents must outscore legitimate ones in cosine similarity rankings (the Retrieval Condition), and once retrieved, their content must steer LLM output toward the attacker's desired answer (the Generation Condition). Where the academic paper relied on gradient-optimized adversarial payloads requiring knowledge of the embedding model, Raji's approach uses what he calls vocabulary engineering — crafting documents in corporate register prose laden with high-salience financial terms and authority framing such as "CFO-Approved Correction" and "CORRECTED FIGURES." One of the three injected documents explicitly references the legitimate $24.7M figure and marks it as superseded, so the LLM treats the contradiction as already resolved before answering. Raji notes this is closer to soft prompt injection than classical retrieval poisoning, and explains why the LLM's flat context window — which assigns no architectural trust hierarchy to retrieved chunks — makes the attack structurally difficult to prevent at the generation layer.
Defense testing revealed a sharp hierarchy among available controls. Embedding anomaly detection at ingestion time — flagging documents that arrive in coordinated clusters with tight cosine proximity to one another — reduced attack success from 95% to 20%, far outperforming prompt hardening (which dropped success only to 85%) or output monitoring alone. Combining all five defense layers (anomaly detection, prompt hardening, access control, output monitoring, and architectural separation) brought the residual attack rate to 10%. No single post-retrieval control came close. The gap is structural: because the LLM has no architectural mechanism to distinguish retrieved content from trusted instructions, any defense applied after retrieval is fighting the model's own design. For <a href="/news/2026-03-14-captain-yc-w26-launches-automated-rag-platform-for-enterprise-ai-agents">enterprise RAG deployments</a>, that makes the ingestion pipeline — not the model or the prompt — the primary security perimeter.
Not every poisoning attack requires a deliberate adversary. Commenters on Hacker News noted that the same retrieval-condition vulnerability is created routinely by ordinary document hygiene failures: outdated analyses, contradictory policy documents, and uncurated legacy content all produce the retrieval dynamics this attack exploits intentionally. Raji points out that write access to enterprise knowledge bases is typically held not just by administrators but by content managers, analysts, contractors, and automated ingestion pipelines — meaning the threat actor profile for this attack overlaps substantially with the population of people who already have legitimate document access. The OWASP LLM Top 10 for 2025 formally classifies this surface as LLM08:2025 (Vector and Embedding Weaknesses), but as Raji's lab makes concrete, the gap between academic threat modeling and accessible, practitioner-level exploitation has closed considerably. The skill floor for this class of attack, he argues, is now effectively "can write a convincing memo."