Your GPU dashboard has been lying to you. Systalyze, a startup founded by MIT PhD and former Google engineer Manya Ghobadi, just open-sourced Utilyze, a tool that exposes how badly standard monitoring utilities misrepresent GPU performance. The problem is real: nvidia-smi, nvtop, and every major cloud provider's dashboard report "utilization" based on whether any GPU kernel is running, not how much compute that kernel actually uses. An H100 has 17,424 cores. If one single CUDA core is busy, these tools can report 100% utilization while the GPU delivers 1% of its potential throughput.
Utilyze fixes this by reading hardware performance counters through NVIDIA's Perf SDK, measuring actual compute work done rather than just kernel activity. It calculates what Systalyze calls "Attainable SOL" (Speed of Light), a realistic performance ceiling for your specific model and hardware combination. The tool is written in Go, runs on Linux with NVIDIA Ampere GPUs or newer, and works alongside production workloads with negligible overhead. It currently supports vLLM backends with SGLang support planned. Shared GPU workspaces like sllm.cloud allow developers to split costs, but users must be vigilant about the "noisy neighbor" problem that can arise when multiple users share the same hardware.
The AI industry is burning money on GPUs it doesn't need. Production deployments using Utilyze revealed "orders-of-magnitude performance headroom" in systems that standard tools declared fully saturated. When H100 rental prices jumped nearly 40% between October 2025 and March 2026, teams bought more hardware because their dashboards said they had to. They didn't. As Ghobadi puts it: "The gap isn't awareness. Engineers who write CUDA kernels know what accurate utilization looks like. The gap is tooling."
The tool is still early at version 0.1.3. Users on Hacker News note it lacks basic system metrics like memory usage, temperature, and fan speed, so it won't replace nvidia-smi entirely yet. That's fine. Utilyze tells you whether your GPU is actually working hard or just keeping a seat warm. Turning idle GPUs into a P2P AI grid is another approach to improving hardware utilization without upgrading. If you're spending millions on AI infrastructure, you want to know.