The 8 Levels of Agentic Engineering

Anthropic's internal team built Cowork — a full collaborative coding environment — in ten days. Meanwhile, most engineering teams are still struggling to ship anything meaningful with AI assistance, even though they're running the same models. That contrast is the starting point for Bassim Eledath's eight-level framework for agentic engineering, published last week, and it's landing with people who've spotted the same pattern in their own organizations.

Eledath's argument is blunt: the gap between AI benchmark performance and real-world productivity isn't a model problem. It's a practice problem. The models are capable enough. The question is whether the engineers using them know what they're doing.

The framework starts at the bottom, where most people still are. Level 1 is tab completion — GitHub Copilot, autocomplete, the thing every developer has had access to for years. Level 2 is multi-file agent IDEs like Cursor. Both are useful, but Eledath treats them as table stakes, not competitive advantage.

Things get more interesting at Level 3, what he calls context engineering. This is the craft of making your AI as useful as possible before it writes a single line of code — shaping prompts, maintaining CLAUDE.md files, writing precise tool descriptions. Less glamorous than autonomous agents, but Eledath argues it's load-bearing for everything above it.

Level 4 is where he thinks most teams hit the real inflection point. He calls it compounding engineering, crediting the concept to Kieran Klaassen. The loop goes: plan, delegate, assess, codify. That last step is what separates engineers who improve over time from those who run in circles. LLMs have no memory between sessions, so every lesson learned evaporates unless someone writes it back into the rules. The codify step is the difference between a one-time fix and a permanent upgrade to how your team operates.

The upper levels build from there. Level 5 brings in MCP integrations and custom skills, connecting agents to real tools and external systems. Level 6 adds automated feedback harnesses — test runners, linters, evaluation loops that let agents self-correct without a human in the loop. By Level 7, agents are raising pull requests overnight while the engineers who set them up are asleep. Level 8 is full multi-agent orchestration, with Claude Code as the primary framework for coordinating agents working in parallel. Eledath references Boris Cherny and patterns like the Ralph loop at this tier, alongside tooling like DeepWiki MCP and Braintrust MCP.

The most practically urgent idea in the piece, though, has nothing to do with any individual level. Eledath calls it the multiplayer effect. A Level 7 engineer running overnight agents immediately hits a ceiling if their colleague is still reviewing code manually at Level 2. The whole operation slows to match. Individual mastery doesn't compound if the team doesn't move together — which reframes agentic engineering as an organizational challenge as much as a technical one.

Engineering leaders have had no shortage of benchmark numbers to point to, and no vocabulary for explaining why those numbers haven't shown up in their teams' output. Eledath's framework gives them a diagnostic. Whether the specific eight-level structure holds up under scrutiny matters less than the underlying claim: the models are ready, the levels are sequential, and whoever's already climbed the ladder benefits most when the models get better.