Context Rot Can't Be Fixed at the Engine Level, New Essay Argues

A technical essay on the Dead Neurons Substack argues that the AI field has been solving the wrong context problem. Published March 16, 2026, it proposes Agentic Context Management (ACM) — an architecture where the model itself governs its context window using purpose-built tools, rather than relying on inference-engine compaction that operates beneath the model's semantic awareness.

The essay's central concept is "context rot": performance degradation that sets in well before a session hits nominal token limits, as the window fills with stale tool outputs, failed explorations, and intermediate data that has already done its job. "The model doesn't suffer because it ran out of space," the author writes. "It suffers because it can no longer tell the difference between what it learned an hour ago and what it needs right now." That distinction — between raw capacity and semantic relevance — is the load-bearing argument for why ACM puts control at the model layer instead of the infrastructure layer.

The essay positions ACM against two 2026 papers that also grapple with context limits. Recursive Language Models (RLM), by Alex L. Zhang, MIT's Tim Kraska, and Stanford NLP's Omar Khattab (arXiv: 2512.24601), never lets the model directly ingest a user prompt; instead it treats the prompt as a variable in a Python REPL, using llm.query() calls to decompose and recursively process sub-portions. RLM posts strong benchmark numbers on S-NIAH, OOLONG, and BrowseComp for massive static inputs. The ACM essay concedes the results but argues RLM's single-shot architecture was never designed for the problem being solved — there's no mechanism for the conversational working memory degradation that accumulates across a live agentic session. The second paper, Lossless Context Management (LCM) by Ehrlich and Blackman, works at the inference engine level with a DAG-based structure and automatic compaction thresholds. The author credits LCM for targeting the right problem, then faults it for the same reason: compaction decisions made below the model layer lack the semantic context to distinguish what's stale from what's still load-bearing.

The Kraska-Khattab pairing on RLM is where the ACM author's critique finds its sharpest edge. Kraska — Director of Applied Science at AWS, known for the 2018 "Case for Learned Index Structures" paper — approaches context as a queryable data structure. Khattab, creator of DSPy (32,800+ GitHub stars), thinks in terms of declarative programming over model behavior. Together they landed on a predictable synthesis: build the right runtime primitives and the model will manage context correctly. The ACM essay's implicit argument is that both lenses share the same blind spot — they optimize for how context gets structured externally, without asking whether the model's own judgment about relevance is the thing being forfeited. A database systems researcher and a declarative-programming researcher collaborating on context management will design a better container. They won't design a model that knows what to throw away.

For practitioners building long-horizon agents, the gap the essay describes is real. Production agentic sessions routinely hit context degradation before hitting hard token limits — the effective ceiling is lower than the nominal one, and it shifts depending on how much accumulated noise the model is carrying. ACM's prescription is concrete: expose context-management primitives to the model layer and treat working memory as a first-class agentic capability, not an infrastructure concern managed around the model. The next question is whether any major agent framework picks this up, or whether engine-side compaction remains the default precisely because it requires no model-layer changes from framework authors.