Cumulus Compute Labs, a Y Combinator W26 startup, has publicly launched IonRouter, a high-throughput LLM inference platform built around the company's proprietary IonAttention engine. The core technology multiplexes multiple models on a single GPU simultaneously, enabling millisecond-level model swapping and real-time adaptive traffic scaling. Benchmarks published by the company claim approximately 7,167 tokens per second on a single NVIDIA Grace Hopper GH200 for Qwen2.5-7B — roughly 2.4 times the throughput of leading inference providers at comparable cost. The platform exposes an OpenAI-compatible API endpoint, requiring only a one-line change from existing client code.

The model catalog spans text, image, and video. On the text side, ZhiPu AI's GLM-5 is a 600B+ mixture-of-experts model running via EAGLE speculative decoding on 8x B200 GPUs at around 220 tok/s. MoonShot AI's Kimi-K2.5 covers reasoning tasks, MiniMax-M2.5 offers 1M-context support, and Qwen3.5-122B-A10B rounds out the frontier text options. For media generation, Black Forest Labs' Flux Schnell delivers sub-4-second image outputs and Wan2.2 produces text-to-video clips in under 10 seconds. Custom LoRA and fine-tuned model deployments get dedicated GPU streams, zero cold starts, and per-second billing. Target verticals span robotics real-time vision-language model perception, multi-stream video surveillance, game asset generation, and AI video pipelines.

The Hacker News thread surfaced substantive product gaps. For teams building agentic loops — one of IonRouter's stated target use cases — the absence of published quantization details for hosted models and the lack of cached or prefix input pricing are meaningful gaps. Cached pricing typically matters far more than headline per-token rates for <a href="/news/2026-03-14-context-gateway-llm-compression-proxy">agentic workloads</a> that repeatedly reference long system prompts or fixed context windows; without it, cost modeling for production deployments is incomplete. Commenters also questioned whether IonRouter is operating under the alias "Ionstream" on the OpenRouter aggregator marketplace, a point the company has not publicly addressed. Privacy policy language around prompt storage drew additional scrutiny from potential enterprise buyers handling sensitive data.