AlphaZero-style training hits a wall on impartial games like Nim — parity functions break it completely

A paper published in the journal Machine Learning by Bei Zhou and Soren Riis has identified a systematic failure mode in AlphaGo and AlphaZero-style self-play training: the approach collapses on an entire category of games known as "impartial games," with Nim as the canonical example. The same class of AI that reached superhuman performance in chess, Go, and shogi cannot reliably beat a novice human at a game played with matchsticks.

In impartial games, both players share the same pieces and move under identical rules — unlike chess, where each side controls distinct forces. Nim's theoretical weight is considerable: the Sprague-Grundy theorem establishes that any position in any impartial game can be mapped to a Nim configuration. Failure on Nim means failure across the entire class.

The technical problem is the parity function. Finding a winning Nim position requires computing a bitwise XOR across matchstick row counts — a discrete, symbolic operation. AlphaZero-style systems learn by building probability maps that link board states to win likelihoods through self-play. That works well when winning conditions carry rich positional structure, as in chess or Go. It breaks down when success depends on an abstract mathematical function rather than learnable spatial patterns. Zhou and Riis found that on a seven-row Nim board, a fully trained system's performance became statistically indistinguishable from random play.

More training data and compute won't fix it. "AlphaZero excels at learning through association," the researchers write, "but fails when a problem requires a form of symbolic reasoning that cannot be implicitly learned from the correlation between game states and outcomes." The paper also notes that chess-playing AIs occasionally show analogous blind spots — missed long mating combinations — suggesting Nim-like failure conditions, while rare, are not confined to toy games. Writing for Ars Technica, science journalist John Timmer connected the findings to previously documented adversarial weaknesses in Go AIs, where positions that defeat strong AI systems are trivially handled by beginners.

The gap matters for anyone building agents that do mathematics, formal verification, or scientific reasoning. Self-play and gradient-based associative learning have driven most of the headline results in game-playing AI, but Zhou and Riis now have a clean theoretical proof of where that approach stops working. Agent developers who assume reinforcement learning will eventually learn to handle symbolic computation need to account for this. The paper does not offer a fix — only a precise diagnosis.