AMD's Lemonade: Open-Source Local AI Server Runs on GPU and NPU

AMD has released Lemonade, an open-source local AI server that runs text, image, and speech models on GPUs and NPUs. Built with the local AI community, the project features a 2MB native C++ backend and OpenAI API compatibility. Lemonade supports multiple inference engines — llama.cpp, Ryzen AI SW, and FastFlowLM — letting users run several models at once through a single API for chat, vision, image generation, transcription, and speech synthesis. Windows, Linux, and macOS are all supported.

Unlike Ollama or LM Studio, which focus primarily on serving language models, Lemonade targets unified runtime across AI modalities. The software auto-configures dependencies for specific hardware and offers a one-minute installer. AMD hardware users can choose between ROCm, Vulkan, CPU, GPU, and NPU execution paths — a practical option for anyone who has wrestled with ROCm driver setup.

Lemonade plugs into GitHub Copilot, Continue, OpenHands, and Open WebUI via its OpenAI-compatible endpoints. The NPU support has drawn mixed reactions: GitHub issues show users questioning whether NPUs can handle anything beyond small models without becoming a bottleneck compared to dedicated GPUs. The project has nonetheless racked up over 2,100 GitHub stars and built an active Discord community since launch.

The founding team — veterans from computer architecture, cloud computing, and hardware design — built Lemonade through the AMD Server Creators Program. Community discussions on GitHub and Discord are already debating whether the unified API delivers true abstraction or simply bundles existing tools. Either way, the project's quick iteration and practical integrations have given local AI developers a new reason to consider AMD hardware.