An 80-year-old geometry conjecture just fell. Not to a team of tenured professors with decades of combined expertise. To a general-purpose reasoning model built by OpenAI. On May 20, 2026, the company announced that its AI had disproved a central conjecture in discrete geometry, one first posed by Paul Erdős in 1946. Princeton mathematician Will Sawin refined the AI-generated argument into a human-verified proof. External reviewers have called it "Annals-level" work.
Here is the number that should keep every research funder awake at night. Training a single frontier AI model in 2026 costs somewhere between $500 million and $2 billion. The entire annual budget of the NSF Division of Mathematical Sciences is roughly $260 to $280 million. One model costs more than five to ten years of America's public math funding. And that model just produced original mathematical insight as a side effect of its general capabilities.
I think this is not a story about one proof. It is a story about what happens when the economics of discovery itself get restructured.
The Side-Effect Theorem
Here is the framework. Call it the Side-Effect Theorem: when a byproduct of a massive investment in one domain accidentally solves hard problems in another domain, the second domain's entire incentive structure changes.
One frontier model now outspends a decade of public math funding.
OpenAI did not build its reasoning model to crack Erdős problems. It built the model to reason generally. The geometry breakthrough was a side effect. That distinction matters enormously. It means the incremental cost of pointing this model at the next open conjecture is trivially small compared to the billions already spent on training. For a private lab, solving famous math problems is now a rounding error on the R&D budget.
Compare that to the traditional path. A mathematician spends years immersed in a subfield. They develop intuition. They try hundreds of approaches. They publish incremental results. They apply for grants in the $100,000 to $400,000 per year range. The entire cycle from problem selection to proof can take a decade or more. The Side-Effect Theorem does not eliminate that path. But it creates a parallel one that runs on a fundamentally different cost curve.
The pattern repeats across history. Semiconductor manufacturing was built for computing, but it accidentally enabled solar panels. GPS was built for military navigation, but it accidentally enabled ride-sharing. When a side effect is powerful enough, it reorganizes the original field. The question is whether AI's mathematical side effects will do the same to pure math.
The Production Function of Discovery
To understand the structural shift, think about what mathematicians actually produce and how. The production function of mathematical research has three inputs: human intuition, time, and verification. For 80 years, the binding constraint on the Erdős unit distance problem was human intuition. Nobody could imagine a construction that beat grids. The model did.
This is qualitatively different from every prior computer-assisted result in mathematics. The 1976 Appel-Haken proof of the Four Color Theorem used computers to check 1,936 reducible configurations. Thomas Hales' proof of the Kepler conjecture in 1998 relied on exhaustive computational verification. Those were brute-force searches through known spaces. What the OpenAI model did was generate a genuinely novel construction. Tim Gowers reportedly called it a "landmark" in AI mathematics.
But here is where the contrarian case deserves serious weight. It is unclear whether this capability generalizes. Disproving a conjecture requires finding a single counterexample or a single better construction. Proving deep theorems requires building long chains of novel abstraction. The Langlands program, mirror symmetry, the Riemann hypothesis: these demand conceptual architectures that no AI has demonstrated the ability to construct. Adam Zsolt Wagner's earlier work at Tel Aviv University showed that neural networks could disprove five graph theory conjectures by searching for counterexamples on a five-year-old laptop. The OpenAI result is more sophisticated, but it lives on the same side of the prove-versus-disprove divide.
The sociology of mathematics adds another layer of friction. Tenure committees at Princeton, Oxford, and IHÉS evaluate human paper records and invited talks. Journals require human-readable expositions. Referees must understand the argument. Leslie Hogben of Iowa State University drew a sharp line: she welcomes computer disproofs that can be verified, but a computer proof that cannot be checked by hand "would personally cause me some problems." Publication norms, hiring criteria, and grant structures all privilege human creativity. These institutions move slowly.
And yet. The production function is shifting whether institutions acknowledge it or not. If a general-purpose model can generate thousands of candidate constructions per session at a compute cost of $50 to $500, the bottleneck moves from "can we find any promising idea?" to "which of these AI proposals are deep enough to formalize?" The human mathematician's role shifts from explorer to curator. That is not a demotion. Curation of ideas across a vast search space is arguably a higher-order skill. But it is a fundamentally different job description.
The funding implications follow directly from the Side-Effect Theorem. Private AI labs now have a strong incentive to hire 6 to 50 mathematicians each over the next two to three years. Not because math is their core business, but because solving famous problems generates scientific prestige, strengthens regulatory narratives about producing fundamental value, and occasionally yields patentable insights in cryptography or optimization. The incremental cost is negligible against total R&D budgets measured in billions. Meanwhile, public funding agencies like the NSF face a choice: compete on a playing field where private labs outspend them 10 to 1, or redefine their role entirely.
My read on this: the most likely outcome is a bifurcation. Private labs will dominate the "fast conjecture-testing" layer of mathematical research. Public institutions will retreat to the "deep theory-building" layer that AI cannot yet touch. The risk is that the boundary between those layers keeps moving.
2031
Three signals inside the same shift
Private AI labs outspend public math funding by an order of magnitude.
A single frontier model costs $500M to $2B to train. The NSF Division of Mathematical Sciences operates on roughly $270M per year. Private labs can now treat famous proofs as prestige byproducts while public funders struggle to compete on the same playing field.
Mathematicians shift from explorers to curators of AI-generated ideas.
The production function of discovery is changing. When a model generates thousands of candidate constructions at $50 to $500 per session, the bottleneck moves from ideation to judgment. By 2031, internal math teams of 20 to 40 people at leading labs will aim reasoning models at open problems and translate outputs into publishable proofs.
A handful of AI labs could become the de facto agenda-setters for mathematics.
If three or four leading labs become the primary funders and employers of mathematical talent, they will pursue problems that demonstrate model capability or generate commercial value. Problems with no obvious application, historically the source of the deepest breakthroughs, risk being starved of attention.
Five years from now, here is what the landscape probably looks like.
The asymmetric advantage belongs to whoever controls the best reasoning models. OpenAI, Google DeepMind, Anthropic, and perhaps two or three others will maintain internal math teams. These teams will not look like traditional departments. They will be small, maybe 20 to 40 people, with expertise split between pure mathematics and prompt engineering. Their job will be to aim the models at high-value open problems and translate outputs into publishable proofs.
The compounding effect is what matters most. Each solved problem generates training signal for the next model generation. Each new model generation gets better at mathematical reasoning. This is a flywheel. By 2031, the question will not be whether AI can contribute to mathematics. It will be which problems remain exclusively human.
KPMG's deployment of Claude across 276,000 employees in 2026 signals that knowledge-work automation is scaling fast. Mathematics is a small, prestigious corner of knowledge work, but it is not immune. The 30 to 50 percent of mathematicians who already use LLMs for at least some tasks, according to informal surveys from 2024 and 2025, will likely become 80 percent or more by 2031.
The concept of shoshin, beginner's mind, becomes strategically essential here. Mathematicians who cling to the identity of "I am the person who has ideas" will struggle. Those who adopt the identity of "I am the person who recognizes which ideas matter" will thrive. The skill shifts from generation to judgment. Only cash is real, the rest is accounting. And in mathematics, only verified proofs are real. The rest is conjecture. AI changes who generates the conjectures. It does not change who decides they matter.
The real risk, the one almost nobody is discussing, is institutional capture. If the three or four leading AI labs become the de facto funders and employers of mathematical talent, they also become the de facto agenda-setters. They will pursue problems that demonstrate model capability or generate commercial value. Problems with no obvious application, the kind that historically produced the deepest breakthroughs, could be starved of attention. It is unclear whether public institutions have the budget or the will to prevent that outcome.
What to Build This Weekend
You do not need a PhD to start exploring how AI interacts with formal reasoning. Here is a concrete weekend project.
First, pick a simple open conjecture from the OEIS (Online Encyclopedia of Integer Sequences) or from any "open problems" list in combinatorics. Start small. Something with a clear definition and a finite search space.
Second, use a reasoning model to generate candidate counterexamples or constructions. Write a clear prompt that states the conjecture precisely, defines the terms, and asks for potential counterexamples with explanations. If you get something interesting, verify it by hand or with a simple Python script.
Third, document your process. Use a tool like Vidocu to turn your screen recording of the exploration into a polished walkthrough. This is not just for sharing. It forces you to articulate what the model did well and where it failed.
Fourth, if you want to go deeper, install Lean 4 and try formalizing one small claim from the model's output. You will break things. That is the point. The gap between "the model said this is true" and "Lean accepts this as proven" teaches you more about mathematical rigor than any textbook.
The goal is not to solve an 80-year-old problem this weekend. The goal is to feel the shape of the new production function. To understand, in your hands, what it means when idea generation gets cheap and verification stays hard. That tension is where the next decade of mathematical research will live. Get your reps in now.
Aim a reasoning model at a small open conjecture and document what happens.
- Pick a target conjecture. Browse the OEIS or any combinatorics "open problems" list. Choose something with a clear, compact definition and a known boundary. Smaller is better for a first attempt.
- Prompt a reasoning model systematically. Use OpenAI's o-series or an equivalent chain-of-thought model. Feed it the conjecture statement, known bounds, and ask it to search for counterexamples or improved constructions. Log every output, including failures.
- Verify and write up your results. Even a negative result (model fails to find anything new) is informative. If the model proposes a candidate, check it by hand or with a symbolic math tool like SageMath or Lean 4. Post your writeup to a public repo so others can replicate.
The proof is verified. The funding model is not.
AI just produced original mathematical insight as a side effect of general-purpose training. That single fact restructures the economics of discovery. Private labs will dominate fast conjecture-testing while public institutions retreat to deep theory-building, but the boundary between those layers will keep moving. The institutions that survive will be the ones that redefine their role before the next model generation redefines it for them. Only verified proofs are real. Everything else is conjecture, including the assumption that today's research funding model still works.