GPT-5.4, OpenAI's latest flagship model, can be talked into executing a reverse shell through a chain of riddles and encoded payloads. That's the takeaway from the fifth installment of the BrokenClaw security research series, which tested the model inside OpenClaw, an AI agent framework that connects LLMs to external data sources like web pages and emails. In both test scenarios, the model fetched untrusted content and decoded it without asking, then followed embedded instructions through fake HTTP redirects until it ran a malicious script. OpenClaw had countermeasures in place, explicit security notices telling the model not to trust external content or execute commands from it. The model ignored them.
The attack chain is almost comical in its complexity. The researcher, posting as veganmosfet, set up a webpage that told the agent it was a "302 redirect" (it wasn't; the real HTTP status was 200). The agent followed it anyway, found a base85-then-base64 encoded string, decoded it without asking, got instructions to solve a riddle, fetched more pages, and eventually downloaded and executed a shell script via curl. That script opened a reverse shell connection back to the attacker. The model then cheerfully reported it had decrypted a message using the key "lobster," apparently unaware it had also been compromised. The email attack followed a similar pattern: a malicious email contained encoded instructions that led the agent through multiple decode-and-fetch steps until it ran a Python script that spawned a reverse shell in a background process.
These exploits don't work reliably every time. That's cold comfort when the countermeasures only work sometimes too. The fact that a state-of-the-art model can be nudged into executing arbitrary code through what amounts to a digital scavenger hunt is a real problem for anyone building autonomous agents that interact with untrusted data. This isn't theoretical; the HackMyClaw bounty program, sponsored by email security company Abnormal AI and app security firm Corgea, is offering $1,000 to researchers who can extract sensitive credentials from OpenClaw through indirect prompt injection via email. The contest exists because this attack vector is practical, not academic.
Both OpenAI and Anthropic's best models remain vulnerable to indirect prompt injection. No amount of warning text prepended to external content reliably stops a determined attacker from manipulating an agent's behavior. If you're building systems where AI agents handle email, browse the web, or process any user-controlled input, assume that input can contain instructions. The model will probably follow them.