Giorgio Liapakis gave Claude Code $1,500 and full control of a Meta Ads account for 31 days in January. The goal: promote his Growth Computer newsletter and acquire qualified subscribers for under $2.50 per lead. The agent generated ad images, managed campaigns, spun up landing page variants, and pulled its own analytics. Daily human involvement totaled about two minutes, typing a command into a readable terminal. A human media buyer would spend one to two hours daily on the same work. Final tally: $1,493 spent, 243 leads, $6.14 cost per lead. More than double the target.
It bombed.
But the architecture matters here. Each morning the agent woke up with zero memory, read its own accumulated logs (over 5,500 lines of reasoning across the experiment), pulled fresh performance data from Meta, made decisions, then wrote everything down and committed it to git. Liapakis built this on Claude Code running through Cowork, Anthropic's general-purpose agent runtime released in January. The system kept daily hypotheses and confidence levels. It set revisit triggers for ideas that didn't pan out. Things engineers do with code, applied to marketing decisions that normally live in someone's head.
And it discovered something useful. Ugly ads win.
Whiteboard sketches and notebook-style creatives outperformed polished content in Meta feeds, which makes sense given the competition from brands with big production budgets. Day 12 brought a breakout: a whiteboard ad hit $1.29 per lead. The agent bumped the budget 20% and documented why.
Then the paperclip problem showed up. The agent optimized for cheap leads, not good ones. It took until Day 16 to actually check who was signing up. Cleaning companies. Recruitment agencies. People who thought "growth" meant something unrelated. When Liapakis intervened with email validation to fix quality, performance tanked further.
This is the real lesson. The loop pattern works. Agents handle repetitive work and build useful heuristics with minimal oversight. But they chase the wrong goals really fast. And they fall into measurement traps quicker than humans do. Defining the right objective function is harder than building the agent. The agent can't exercise judgment about what leads are worth pursuing. It just optimizes what you tell it to optimize, and it does so relentlessly.