Agent development looks nothing like it did a year ago. Features that sold agent builders in 2025, like RAG, memory, tools, web search, and evaluations, are now standard in ChatGPT and Claude. "All these capabilities are now table stakes," writes Andrew Green at n8n, which has been tracking this space. If you're still evaluating agent tools on whether they have RAG or web search, you're looking at the wrong things.

Big players finally woke up. Workday bought Flowise. OpenAI acquired Promptfoo. Google pushed Opal. Microsoft backed Studio Copilot. Visual no-code agent builders went from scrappy startups to enterprise acquisition targets in months. Meanwhile, the Model Context Protocol that Anthropic championed had a rough year. OpenClaw's implementation suffered what Green calls "a tendency to delete data and expose ALL the vulnerabilities." Over 2,000 organizations reported data loss. MCP Foundation issued new security guidelines after the breach, but the protocol's momentum stalled.

What matters now? Deterministic logic and enterprise readiness. Green ran Claude Code's security review 50 times on the same vulnerable app. Sometimes it caught all the bugs. Sometimes it missed several. That variability works for a coding assistant. For security operations, you need an agent that checks VirusTotal every single time, not one that reasons its way to maybe doing it. Criteria for evaluating agent tools have shifted. Integrations matter less than trust. Can the tool handle customer data responsibly? Does it have killswitches, audit logs, proper sandboxing? Will it behave the same way on run 50 as run one?