Open-Source GreenBoost Driver Extends NVIDIA GPU VRAM with System RAM and NVMe for Larger LLMs

Independent developer Ferran Duarri released GreenBoost on March 14, 2026 — an open-source GPLv2 Linux kernel module that transparently extends NVIDIA GPU VRAM by tiering in system DDR RAM and NVMe SSD storage. The project has not yet published direct benchmark comparisons against CPU-offload methods, a caveat worth keeping in mind before drawing conclusions about real-world performance. The project was first reported by Phoronix.

The problem Duarri was solving was specific: running a 31.8GB quantized LLM (glm-4.7-flash:q8_0) on a GeForce RTX 5070 with only 12GB of dedicated VRAM. Heavier quantization degrades output quality. CPU-offload approaches suffer token throughput penalties from CUDA coherence limitations. GreenBoost is a transparent memory extension that requires no modification to existing CUDA applications.

The driver has two components. The kernel module (greenboost.ko) allocates pinned DDR4 pages using 2MB compound pages for efficiency and exports them as DMA-BUF file descriptors, which CUDA then imports as external memory via cudaImportExternalMemory — making system RAM appear as device-accessible memory to the CUDA runtime, with PCIe 4.0 x16 handling data movement at roughly 32 GB/s. A companion CUDA shim library (libgreenboost_cuda.so), injected via LD_PRELOAD, intercepts cudaMalloc and related allocation calls and routes large allocations through the extended tier while keeping small allocations on native VRAM. The approach sidesteps the page-fault overhead that makes CUDA Unified Memory (cudaMallocManaged) perform poorly under oversubscription on x86 systems, where NVIDIA's own documentation acknowledges performance can degrade dramatically depending on access patterns.

GreenBoost targets a real gap in NVIDIA's official product lineup. NVIDIA's Extended GPU Memory (EGM) feature achieves high-bandwidth CPU-GPU memory sharing, but it is architecturally restricted to Arm-based Grace Hopper Superchip data center platforms costing tens of thousands of dollars per node. GPUDirect Storage, which enables direct NVMe-to-GPU data paths, is similarly gated to Quadro workstation and data center GPUs; consumer GeForce cards are limited to a degraded CPU-routed compatibility mode. For a consumer RTX owner on x86 Linux, no NVIDIA-official mechanism efficiently extends VRAM for LLM inference workloads, and GreenBoost is a direct attempt to change that.

The project remains experimental. The Hacker News community flagged the missing benchmark data against CPU-offload methods as a meaningful open question. Whether PCIe-mediated access to pinned DDR pages measurably outperforms the established CPU-offload techniques used by tools like llama.cpp in practice has not been demonstrated empirically. The code is hosted on GitLab under GPLv2 and is described as complementary to, rather than a replacement for, NVIDIA's official Linux drivers.