headphones
Expert Analysis
Two-minute conversation (~2 min)
smart_display
Visual Narrative
Animated story breakdown (~2 min)
or watch on YouTube →

OpenAI just cut inference prices by 30% across its model tiers. GPT-5 Nano now costs $0.05 per million input tokens. In batch mode with caching, that drops to $0.025. A full-paragraph AI response costs less than $0.001. And OpenAI lost an estimated $5 billion in 2025 while earning $3.7 billion.

Read that again. The company selling you the cheapest intelligence in history is spending $1.35 for every $1 it earns. They are not passing along savings from efficiency. They are subsidizing your AI bill with $110 billion in financing commitments from SoftBank, Microsoft, and others.

This is not a sale. This is a pricing weapon. And it changes how every developer should think about building, budgeting, and betting on AI for the next 24 months.

The Subsidy Sandwich

Here is the framework that explains what is actually happening. I call it the Subsidy Sandwich because developers are getting squeezed between two forces that look like they should cancel each other out but do not.

The top layer: token prices are collapsing. Gartner forecasts another 90% reduction by 2030. The price of a single AI call is approaching zero.

The bottom layer: total AI bills are exploding. The average enterprise AI budget grew from $1.2 million per year in 2024 to $7 million in 2026, per the FinOps Foundation's 2026 State of FinOps Report. Some Fortune 500 companies report monthly inference bills in the tens of millions.

The filling: venture capital and hyperscaler cross-subsidies. OpenAI, Google, Anthropic, and Meta are all pricing inference below cost to capture market share. AI Automation Global describes this as structurally unsustainable. The Subsidy Sandwich means your unit costs are fake, your volume is real, and your future budget is a guess.

Remember this: cheap tokens multiplied by exploding usage equals a bigger bill, not a smaller one. The Oplexa team put it perfectly: "Your AI token costs dropped 280x in two years. Your AI bill went up 320%."

The Offer Nobody Can Refuse (Until They Have To)

Let me frame this the way it actually works. OpenAI is not running a technology company right now. OpenAI is running a customer acquisition machine funded by other people's money. And the 30% price cut is the offer.

Think about what makes an offer irresistible. You need a massive gap between the perceived value and the price. GPT-5 Nano at $0.05 per million input tokens creates that gap. A developer can build an entire copilot feature, serve thousands of users, and keep inference costs under $50 a month. The value is enormous. The price is almost nothing. Of course you say yes.

But here is the damaging admission: the price is a lie. Not because OpenAI is being dishonest. Because the price does not reflect the actual cost of production. That is a negative 70% operating margin. They are buying your usage with investor dollars.

Now look at the competitive response. Anthropic slashed Claude 3.5 Sonnet pricing by 67% in a single move in March 2025. DeepSeek has cut API pricing three times in one quarter, per Bloomberg reporting cited by AI Weekly. Google made Gemini 2.0 Flash free for low-volume users in May 2025. Every provider is racing to the bottom.

Vercel's AI Gateway data tells the real story. Gemini 3 Flash overtook Anthropic in token traffic in April and May 2026. Price-sensitive builders are already defecting to the cheapest option. But Anthropic still leads in total spend. That means the market is bifurcating: commodity workloads chase price, high-value enterprise use cases stay sticky.

This is the classic split. You have your price buyers and your value buyers. The price buyers will switch providers every quarter chasing the lowest token cost. The value buyers will pay 10x more for reliability, latency guarantees, and model quality. If you are building a product, you need to know which side of that split your customers are on. Because your cost structure depends on it.

My read on this: the current pricing is a loss-leader play, not a permanent state. Both Oplexa and AI Automation Global recommend enterprises plan for 30% to 50% API price increases within 18 to 24 months as capital discipline tightens. The golden goose here is not cheap tokens. The golden goose is the distribution and lock-in that cheap tokens create. OpenAI wants you building on their APIs, using their agent frameworks, integrating their tooling so deeply that switching costs become prohibitive.

The math is not complicated. OpenAI's earlier infrastructure plans called for $1.4 trillion in compute spending by 2030. They revised that down to $600 billion in their latest investor update, per Mexc's summary. But even $600 billion requires massive revenue growth. Their 2030 target is $280 billion in annual revenue. If they do not hit that, prices go up. Period.

Here is what developers get wrong. They budget for today's token price and assume it will stay flat or fall further. That is like budgeting for gas at $1 a gallon because a station is running a promotional price funded by a billionaire. It is unclear whether the current pricing regime survives past 2027 if OpenAI and its competitors cannot close the gap between revenue and costs.

The smart move is not to chase the cheapest tokens. The smart move is to architect for portability. Use abstraction layers. Test multiple providers. Build your product so that switching from GPT-5 Nano to Gemini Flash to a self-hosted model takes days, not months. The developers who win in this market will be the ones who treat today's prices as a temporary subsidy and build accordingly.

I think the most dangerous assumption in AI right now is that inference will keep getting cheaper forever. It might. But the financial evidence says the current trajectory is funded by losses, not margins. And losses have an expiration date.

2029

Pull back three years. Where does this pricing war fit in the longer arc of technology platform economics?

We have seen this movie before. Amazon Web Services launched S3 in March 2006 at $0.15 per gigabyte per month. By 2016, that price had fallen over 80%. But AWS revenue grew from roughly $3 billion in 2013 to over $25 billion in 2018. The unit price collapsed. The total spend exploded. Sound familiar?

The asymmetric advantage belongs to whoever controls the infrastructure layer when the subsidy era ends. Right now, OpenAI, Google, and Anthropic are all burning cash to establish that position. Nvidia is investing up to $30 billion in OpenAI's latest funding round. Microsoft walked away from up to 2 GW of data center projects. The capital chessboard is shifting fast.

By 2029, I expect three things to be true. First, commodity inference for mid-tier models will be essentially free, bundled into platform fees the way bandwidth is bundled into cloud pricing today. Second, frontier reasoning models will cost more, not less, per useful unit of work. OpenAI's o3 reasoning model is already reported to be significantly more expensive to run than expected. Third, the real cost center will not be tokens. It will be orchestration: the agentic workflows, RAG pipelines, and always-on agents that chain dozens of model calls per user action.

The compounding flywheel here is usage, not price. Cheaper tokens enable more complex architectures. More complex architectures consume more tokens. Total spend rises even as unit costs fall. Inference costs already represent 85% of enterprise AI budgets in 2026, according to AnalyticsWeek data cited by Oplexa.

The Siebert Financial blog made a sharp observation in April 2026: OpenAI's revenue miss is not proof that AI demand is weakening. It is proof that competition is heating up. Anthropic is gaining enterprise share. The market is not shrinking. It is fragmenting. And fragmented markets with subsidized pricing eventually consolidate around 2 or 3 survivors who then have pricing power.

The beginner's mind approach here is to stop thinking about AI costs as a line item and start thinking about them as a platform bet. The question is not "how cheap are tokens today?" The question is "which provider's ecosystem will I still be building on in 2029, and what will they charge me then?"

What to Build This Weekend

Stop theorizing. Here is what to do in the next 48 hours.

Step one: audit your current inference spend. Pull your API bills from the last 90 days. Calculate your effective cost per user action, not per token. If you are on OpenAI, check whether you are using prompt caching. Cached input tokens on GPT-5 are $0.125 per million versus $1.25 uncached. That is a 10x difference most teams are leaving on the table.

Step two: test a second provider. Pick one workflow in your product and run it through Gemini Flash or Claude 3.5 at their current pricing. Measure quality, latency, and cost. You do not need to switch. You need to know your switching cost. If it would take more than a week to migrate one workflow, your architecture is too tightly coupled.

Step three: set up a cost monitoring layer. Tools like Serno can give you multiple perspectives on how your AI spend is trending. Ignyte consolidates the scattered dashboards founders juggle into a single workspace. Pit can learn your specific business operations and automate the cost tracking workflows you are doing manually.

Step four: model three budget scenarios for the next 18 months. Scenario A: token prices fall another 50%. Scenario B: prices stay flat. Scenario C: prices rise 30% to 50% as subsidies end. If your product only works under Scenario A, you have a pricing risk, not a business.

Step five: batch everything you can. OpenAI's batch API gives you 50% off both input and output tokens for non-real-time workloads. GPT-5 Nano in batch mode costs $0.025 per million input tokens and $0.20 per million output tokens. If your workflow can tolerate up to 24 hours of latency, you are paying double for no reason.

Things will break. Your quality scores will vary across providers. Your latency benchmarks will surprise you. That is the point. The teams that test aggressively now will be the ones who survive the pricing correction that is coming. The teams that assume today's prices are permanent will be the ones scrambling when the Subsidy Sandwich runs out of filling.