AMD has published an official guide showing how to run OpenClaw, an open-source AI agent interface, locally on its Ryzen AI Max processors and Radeon GPUs using the ROCm compute platform. The guide targets AMD's Strix Halo architecture — the silicon underpinning the Ryzen AI Max series — which integrates high-bandwidth unified memory with a discrete-class Radeon GPU in a single SoC. That design lets users run large quantized models without a separate discrete GPU, making AMD's consumer and prosumer hardware a practical option for private, locally-hosted agent workloads.
The featured model is <a href="/news/2026-03-14-canirun-ai-browser-based-hardware-compatibility-checker-for-local-llms">Qwen 3.5 35B</a> in its A3B mixture-of-experts variant, quantized to Q4_K_XL by Unsloth via their GGUF distribution. The A3B designation means roughly 3.5 billion parameters are active during any given inference pass despite the full 35B parameter count, keeping memory pressure manageable on Ryzen AI Max systems. Inference runs through llama.cpp, using a community-built ROCm 7.2 Docker image (kyuz0/amd-strix-halo-toolboxes) optimized for the Strix Halo architecture. [Editor note: ROCm version, quantization format, Docker image name, and A3B parameter explanation should be verified against the AMD guide and HN thread before publishing.]
Hacker News user everlier flagged a faster path using Harbor, a local AI stack manager. Harbor reduces setup to four CLI commands: pull the Unsloth-quantized Qwen model, configure llama.cpp with the ROCm-optimized image, set OpenClaw's default model, and start the stack. What AMD's guide handles step-by-step, Harbor wraps into a scriptable sequence — useful for developers who want repeatable deployments or just don't want to debug Docker environment variables on a Saturday afternoon.
AMD's willingness to publish first-party setup guides for local agent workloads signals how seriously the company is chasing the developer audience that has so far defaulted to NVIDIA for AI inference. The ROCm toolchain has historically lagged in community support, but projects like Harbor and Unsloth's GGUF distributions are filling gaps that AMD itself hasn't addressed. For anyone prioritizing privacy or offline operation, the combination of Strix Halo's unified memory and this software stack now makes fully on-device agent inference a realistic weekend project rather than a research exercise.