OpenAI's GPT-5.4 Pro has become the first AI system to solve a genuine open problem in mathematics, cracking a Ramsey-style hypergraph problem from Epoch AI's FrontierMath benchmark that had previously resisted attempts by 5 to 10 expert mathematicians and was estimated to require one to three months of expert human effort. The problem, contributed by Will Brian, Associate Professor at UNC Charlotte, concerns <a href="/news/2026-03-14-alphaevolve-llm-agent-improves-lower-bounds-for-five-classical-ramsey-numbers">improving lower bounds</a> on the sequence H(n) — the maximum vertex count of a hypergraph with no isolated vertices and no partition larger than size n. The solution was elicited by Epoch AI researchers Kevin Barreto and Liam Price and confirmed correct by Brian, who plans to write it up for publication in a specialty journal.

Rather than extending the previously known binary recurrence construction, GPT-5.4 Pro introduced a generalized t-ary harmonic composition framework, proving that H(n) ≥ c_t · n ln n − O(n) for every fixed integer t ≥ 2. Taking t to infinity yields H(n) ≥ (1 − o(1)) · n ln n — a strictly stronger asymptotic lower bound than the previously known ½ n log₂ n − O(n). Brian noted the result "eliminates an inefficiency in our lower-bound construction" and that the matching lower and upper bounds are "unusually tight for Ramsey-theoretic problems." The solution is fully constructive, with explicit certified witnesses for n = 15 through 25. Brian has indicated he may pursue follow-on work inspired by the AI's generalized framework.

Following GPT-5.4 Pro's initial solve, Epoch AI completed a general scaffold for testing frontier models on FrontierMath Open Problems, and three additional models subsequently solved the same problem: <a href="/news/2026-03-14-1m-token-context-window-generally-available-claude-opus-4-6-sonnet-4-6">Anthropic's Opus 4.6</a>, Google's Gemini 3.1 Pro, and OpenAI's GPT-5.4 at its highest compute setting. All three reached the correct result independently. That outcome matters because FrontierMath was designed specifically to test whether models can produce novel, verifiable mathematics — not just perform well on problems with known solutions. Four model families clearing that bar on the same problem is a concrete data point, not a benchmark anomaly.

The case raises unresolved questions about attribution in AI-assisted research. Barreto and Price, who contributed no mathematical derivation but designed the prompting strategy and verified the output, have been offered coauthorship on any resulting papers — a role with no clear precedent in pure mathematics publishing. GPT-5.4 Pro itself receives no authorship consideration, consistent with policies from Nature, Science, and the ICMJE Vancouver Criteria, which require authors to bear legal and ethical accountability that AI systems cannot assume. As Epoch AI accumulates additional solved open problems across model families, this case may become a reference point for how specialty mathematics journals formalize disclosure and attribution policies for AI-assisted proofs.