Fields Medal winner Terence Tao and Cornell ORIE mathematician Damek Davis have launched the Mathematics Distillation Challenge, a competitive AI benchmark hosted by the SAIR Foundation (Foundation for Science and AI Research) that asks participants to craft a compact "cheat sheet" — capped at 10 kilobytes — capable of improving cheap, open-source LLM performance on universal algebra true-false problems. The challenge draws directly from Tao's 2024 Equational Theories Project (ETP), which combined Lean formal verification with automated theorem provers to resolve over 22 million true-false questions in universal algebra. Stage 1 opened March 13, 2026, and runs until April 20, with the top 1,000 submissions advancing to a second stage requiring not just true-false answers but proofs or counterexamples.
The gap the challenge targets is significant: frontier models handle these problems reliably but are expensive and offer little interpretive transparency, while smaller open-source models currently hover around 50% accuracy — statistically indistinguishable from random guessing on binary questions. Early experiments show that well-crafted prompts or cheat sheets can push smaller model accuracy to roughly 55–60%, and the competition aims to quantify how far that ceiling can be raised. Contestants can test submissions via a provided playground against a public set of 1,200 benchmark problems (1,000 standard, 200 designated "hard"), with evaluation against a private test set following the Stage 1 deadline.
The challenge has surfaced a broader debate in AI-for-mathematics circles about prompt engineering and knowledge distillation versus more heavyweight approaches. Commenters on Hacker News have argued that fine-tuning dedicated models or building systems analogous to Google DeepMind's AlphaProof — which achieved silver-medal performance on IMO problems — would yield more powerful results. Tao has framed the distillation challenge as explicitly complementary to such approaches, prioritizing accessibility and community participation over raw performance. Davis, whose research centers on large-scale optimization and machine learning, drew on prior experiments using LLMs for Lean formalization and his time at the Simons Institute's Fall 2024 LLM program when shaping the challenge design with Tao.
The SAIR Foundation, where Tao serves as a board member, is providing technical infrastructure and compute. The organizers have said that meaningful results here could lead to similar challenges across other mathematical problem classes, with additional cooperative challenges planned for the coming months. It's a format Tao has used before — the Polymath projects and collaborative formalization efforts ran on the same premise that opening a hard problem to a structured crowd can surface solutions no small team would find alone.