Cloudflare wants to be the switchboard for every AI model you use. The company just rolled out a unified inference infrastructure layer that lets developers call over 70 models from a dozen providers, including OpenAI, Anthropic, Google, and Alibaba Cloud, all through a single API endpoint. Switching between providers takes one line of code. If you're already using Workers AI, you just change the model name in your existing AI.run() call. The platform also handles unified billing across all providers, so you're not juggling separate invoices from every model company.

The pitch is aimed squarely at agent builders. Agents chain multiple model calls together to complete tasks. A customer support bot might use a cheap fast model to classify intent, then a heavy reasoning model to plan actions. When one provider slows down or drops a request in that chain, the whole thing stalls. Cloudflare's AI Gateway adds automatic failover between providers and routes traffic through its 330-city network to cut time to first token. That's the metric that makes an agent feel responsive or sluggish to a user watching it think.

The Replicate team is also joining Cloudflare to help build out custom model hosting. Developers can package their own fine-tuned models using Replicate's Cog container format and push them directly to Workers AI. Cloudflare handles deployment and serving. This puts direct pressure on AWS Bedrock and Azure OpenAI Service, which tie you more tightly to their ecosystems. Cloudflare is betting that developers want flexibility over lock-in, especially when the best model for a given task changes every few months. Community reaction has been positive on the architecture, though some developers have flagged pricing transparency concerns and noted that Zero Data Retention isn't enabled by default across all providers.