AMD quietly released Lemonade, an open-source local AI inference server that runs text, image, and speech models on your PC. The project targets AMD hardware owners who've struggled with ROCm driver headaches. It's been in development for nearly a year under GitHub user 'kyuz0', who appears to have deep ties to AMD's engineering team.
The server weighs in at just 2MB thanks to its C++ backend. Setup takes about a minute, and it auto-configures for your hardware whether you're running on GPU, NPU, or CPU. It supports multiple inference engines including llama.cpp and Ryzen AI SW, with the flexibility to use ROCm or Vulkan backends. Users can run multiple models at once through a unified API that handles chat, vision, image generation, transcription, and speech.
Unlike Ollama and LM Studio, Lemonade is designed as a runtime, not just a model server. It handles orchestration across different AI modalities and offers OpenAI-compatible endpoints, meaning it works with hundreds of apps including VSCode Copilot, Open WebUI, n8n, and Dify. The project has already racked up 2,100 GitHub stars.
The big question is NPU performance. While discrete GPUs still do the heavy lifting for large models like gpt-oss-120b, the NPU integration could matter more as AMD pushes AI capabilities into consumer chips. For now, Lemonade gives AMD hardware owners a pragmatic option that doesn't require wrestling with ROCm dependencies.