AMD shipped NPU silicon inside every Ryzen AI processor going back to 2023. The AMDXDNA accelerator driver made it into the mainline Linux kernel. And then, for the better part of two years, essentially nothing happened. Even AMD's own GAIA software quietly routed around the NPU, falling back to Vulkan-based iGPU acceleration — a polite admission that the user-space ecosystem was nowhere near ready.

Lemonade Server 10.0 changes that picture, at least on the inference side. The release ships with NPU-accelerated LLM inference and Whisper speech recognition on Linux, powered by FastFlowLM 0.9.35 — a new runtime built specifically for Ryzen AI silicon. It supports context lengths up to 256,000 tokens, covers all Ryzen AI 300 and 400 series SoCs, and has a practical catch: you'll need either the Linux 7.0 kernel or AMDXDNA driver back-ports, due to some accelerator driver fixes that landed just before the release.

One feature will land differently for this readership: Lemonade 10.0 ships with native Claude Code integration. For developers running air-gapped or privacy-sensitive workflows, running Anthropic's coding assistant entirely on local NPU hardware — no cloud, no data egress — is a meaningfully different setup than anything previously available on Linux.

AMD is also launching the Ryzen AI Embedded P100 and Ryzen AI PRO 400 series alongside this release, both targeting embedded and enterprise customers who predominantly run Linux. Those markets have had the NPU silicon for a while; the missing piece has always been software that actually uses it. Whether Lemonade and FastFlowLM hold up as a production-ready foundation or need enterprise hardening is still an open question — the Phoronix team is planning benchmarks on Framework Desktop and Strix Point hardware shortly, which should give a clearer read on real-world throughput.