Poolside just released two models built for long-running agent tasks. Laguna M.1 is the big one: 225B parameters (23B active), trained from scratch on 30 trillion tokens across 6,144 NVIDIA Hopper GPUs. It scores 46.9% on SWE-bench Pro and 40.7% on Terminal-Bench 2.0. The smaller Laguna XS.2 runs 33B total with just 3B active parameters but still hits 44.5% on SWE-bench Pro. XS.2 is also Poolside's first open-weight release, licensed under Apache 2.0.

The competition is stiff. Qwen3.6-35B-A3B, with a nearly identical parameter profile, outperforms both Laguna models on several benchmarks in Poolside's own comparison tables. It scores 49.5% on SWE-bench Pro versus XS.2's 44.5% and 51.5% on Terminal-Bench 2.0 versus M.1's 40.7%.

It's rare to see a company publish head-to-head comparisons where they don't come out on top.

Poolside is betting on a different approach to agents. Rather than structured tool calling, they want agents to write and execute code directly. Code is more expressive because agents can compose actions and parallelize work instead of being limited to predefined functions. Alongside the models, Poolside released its Agent Client Protocol (ACP) server, the same framework used internally for reinforcement learning training. Early users testing through Poolside's "pool" agent in editors like Zed report good speed and spec compliance.

Poolside has raised over $136 million and spent years focused on government clients with strict security requirements. Now they're giving away model weights under Apache 2.0. Founded by former GitHub CTO Jason Warner and source{d} creator Eiso Kant, the company is clearly chasing broader developer adoption. That's the real story: a well-funded player publishing honest benchmarks even when they lose.