How an AI agent hacked McKinsey's AI platform

McKinsey's internal AI platform, Lilli, has been in production since 2023. More than 30,000 of the firm's consultants use it daily for document research, chat, and AI-powered search — it processes upward of 500,000 prompts a month against a corpus of 100,000-plus proprietary documents. On March 1, CodeWall.ai disclosed that an autonomous offensive security agent had broken into it completely, with no credentials and no human in the loop, in under two hours.

The vulnerability that opened the door was an unauthenticated API endpoint — one of twenty-two that required no login, out of more than 200 documented endpoints in Lilli's public-facing surface. The endpoint accepted JSON payloads, but the server was constructing SQL queries by concatenating field *keys* directly into the query string, not values. Standard parameterisation defences don't catch that pattern. Neither did OWASP ZAP. The agent ran fifteen blind iteration cycles, reading reflected database error messages to map the query structure, until production data started coming back.

From there it chained the SQL injection with an insecure direct object reference (IDOR) vulnerability to reach individual employee search histories. CodeWall.ai's disclosure report claims the total exposure included 46.5 million chat messages, 728,000 files, 57,000 user accounts, 3.68 million RAG document chunks, and 95 AI system prompt configurations across 12 model types. Agent Wars has not independently verified those figures.

The data breach is serious. The write access is worse. Lilli's system prompts — the instructions that tell the platform how to answer questions, cite sources, and apply guardrails — sat in the same compromised database. A single SQL UPDATE statement would have been enough to rewrite them silently, with no code deployment, no change log, and nothing in the audit trail. The practical risk: consultant outputs subtly skewed, financial models fed manipulated AI responses, guardrails stripped, sensitive data steered toward client-facing documents.

McKinsey patched within a day of disclosure. The vulnerability had been live in production, undetected, for over two years.

CodeWall.ai is framing this as evidence for something broader: that AI system prompts have become crown jewel assets — as sensitive as source code or encryption keys — and that enterprises aren't treating them that way yet. The McKinsey case fits that argument. Lilli's prompts weren't isolated behind a separate access layer or subject to integrity monitoring. They were rows in a database reachable by an unauthenticated HTTP request.

The second argument, about autonomous agents changing the threat landscape, is worth taking seriously even accounting for the obvious commercial interest of an offensive security firm making it. Traditional scanners work from checklists. The CodeWall agent mapped endpoints, identified a subtle injection class that checklist tools missed, iterated against error responses, and chained two distinct vulnerabilities — the kind of multi-step reasoning that used to require a skilled human tester with time to spare. The speed matters, but it isn't the whole story. The more significant shift is that this class of attack can now run continuously, against any target with a public disclosure policy and a documented API surface.