LLM Neuroanatomy: Topping HuggingFace Leaderboard by Duplicating Middle Layers Without Changing Weights

Independent researcher David Noel Ng took the top spot on the HuggingFace Open LLM Leaderboard v2 in late 2024 with a model called dnhkng/RYS-XLarge — without training, weight merging, or a single step of gradient descent. His method: take Qwen2-72B, an existing 72-billion parameter open-weight model, duplicate seven consecutive middle layers, and stitch the result back together. The modified model ran on two consumer NVIDIA RTX 4090 GPUs using the ExLlamaV2 quantized inference engine and outperformed well-funded lab submissions across six benchmarks: IFEval, BBH, MATH Lvl 5, GPQA, MuSR, and MMLU-PRO. Ng documented the full discovery process in a blog post published March 10, 2026 — more than a year after the result — a level of methodological transparency that drew praise from the Hacker News community, where polished papers typically obscure the actual path to discovery.

The conceptual foundation Ng calls "LLM Neuroanatomy" rests on two earlier anomalies he identified. First, sufficiently capable 2023-era language models could correctly answer questions posed entirely in Base64 — implying that internal representations are format-agnostic, with early transformer layers acting as input translators and late layers as output translators. Second, a November 2023 community model called Goliath-120B, built by HuggingFace user Alpindale by interleaving layers from two different fine-tuned Llama-2 70B models, produced coherent output despite feeding each model's later-layer activations back into the other's earlier layers — something that should have been statistically catastrophic. Ng's hypothesis synthesized these two clues: middle layers perform universal abstract reasoning and are far more interchangeable than architecture theory would predict. To test it, he built a systematic "brain scanner" that evaluated 3,241 different layer-duplication window configurations, ultimately identifying the optimal seven-layer block to duplicate.

The academic literature has since converged on the same structural insight from a theoretical direction. The SOLAR 10.7B paper (Kim et al., accepted NAACL 2024) demonstrated depth up-scaling by duplicating transformer layers to build a model that outperformed 30B baselines. "The Curse of Depth" (Sun et al., arXiv 2502.05795, NeurIPS 2025) provides a formal explanation: Pre-Layer Normalization causes output variance to grow exponentially with depth, causing deep layers to approximate identity functions and concentrating actual computation in middle layers. A third paper, "Scaling up Test-Time Compute with Latent Reasoning" (Geiping et al., arXiv 2502.05171, NeurIPS 2025), takes the logic further by training a model with a single recurrent reasoning block repeated at inference time. Together, these works suggest that the privileged computational role of middle transformer layers is a deep structural property of current architectures — a conclusion Ng reached empirically, running on basement hardware, roughly contemporaneously with researchers at well-resourced institutions.

Ng's background in biotech neuroscience is not incidental to the discovery. His framing of transformer layers in terms borrowed from functional neuroanatomy — "reasoning cortex," "input translators," invariant representations — mirrors classical neuroscience distinctions between sensory, association, and motor cortex, and led him to ask questions about functional localization that mainstream ML researchers typically frame in terms of gradient flow or loss landscapes. His "brain scanner" methodology, systematically varying a spatial window parameter to identify regions of maximum functional contribution, is structurally analogous to receptive field mapping in neuroscience rather than the ablation or probing paradigms more common in mechanistic interpretability research. He was asking "what is this part for?" at a time when the field's own theoretical tools hadn't yet produced a clear answer — and he got there first.