The Development
Axiom, an AI startup founded by Carina Hong, has solved four previously unsolved math problems using its AxiomProver system. The proofs include a five-year-old conjecture from mathematicians Dawei Chen and Quentin Gendron, and a complete proof of Fel's Conjecture involving formulas from Srinivasa Ramanujan's century-old notebooks.
The company combines large language models with proprietary reasoning systems trained to generate proofs in Lean, a formal verification language. Unlike ChatGPT's recent math work, AxiomProver doesn't just find solutions, it verifies them formally.
What This Means in Practice
The proofs aren't solving Fields Medal problems yet, but they're tackling questions that have blocked working mathematicians. When Chen spent hours prompting ChatGPT for a solution last month, it failed. Axiom's system found the answer by connecting the problem to 19th-century number theory humans had missed.
This matters beyond pure math. The same formal verification techniques apply to software verification and chip design, areas where Nvidia is already investing. Harmonic AI, working in similar territory, recently raised $120M at a $1.45B valuation with Nvidia backing.
The Pattern
Axiom's results fit a broader trend: AI is moving from competition wins to research applications. Since Christmas 2025, 15 Erdős problems shifted from open to solved, with 11 crediting AI assistance. Terence Tao counts eight with meaningful AI progress, though he notes the tools excel on easier problems and still need human guidance on hard cases.
The startup's approach differs from Google's 2024 AlphaProof by incorporating newer techniques and focusing on end-to-end proof generation. Hong, the CEO, sees commercial applications in high-assurance software where provable correctness beats probabilistic accuracy.
Trade-offs
Math verification is a natural fit for formal systems because proofs either work or they don't. The challenge is scaling beyond problems where existing theory provides stepping stones. As Tao notes, AI often finds and fills prior work rather than reasoning completely independently.
Still, when a system can solve a problem that stumped experts for five years, and verify the solution formally, that's worth watching. The question is how far up the difficulty curve these tools can climb.