The Token Trap

Uber's CTO Praveen Neppalli Naga burned through his entire 2026 AI budget before the year hit its midpoint. Not on hiring. Not on infrastructure. On tokens. According to The Information's May 2026 report, the bill for API calls alone outpaced every projection his team had modeled. Over at Nvidia, VP of applied deep learning Bryan Catanzaro told Axios something even more jarring: compute costs for his team now exceed what they spend on the humans writing the code.

Let that math sink in. The tool costs more than the people it was supposed to replace.

This is not a budgeting problem. This is a measurement problem. When engineering teams track token usage as a primary KPI, they optimize for token usage. Not for outcomes. Not for shipped features. Not for revenue. They optimize for the number that gets reported in the Monday standup. And that number is eating companies alive.

Here is what went wrong, why it mirrors a 200-year-old economics principle, and what the fix looks like.

The Meter Trap

There is a name for what is happening across AI engineering teams in 2026. Call it The Meter Trap: when the unit of billing becomes the unit of performance, teams engineer for the meter instead of the mission.

TOKEN ECONOMICS · MAY 2026THE INFORMATION · LATENT SPACE · PYMNTS · MEXC

Four numbers that expose the AI measurement crisis.

RAG Waste Tokens Latent Space · Dec 2025 analysis

40%

Subsidized AI Agent Activity MEXC · May 2026 report

70%

Bill Reduction via Caching Latent Space · prompt compression est.

75-90%

Cos Planning AI Increase Storyblok · Dec 2025 survey

90%

The pattern is simple. Token spend is easy to measure. Value created is hard to measure. So organizations default to the easy number. Dashboards light up with throughput stats. Leaderboards rank engineers by consumption. The entire incentive structure silently rotates from "build something useful" to "burn tokens visibly."

This is Goodhart's Law in its purest form. Lines of code did this to software teams in the 1990s. Ticket velocity did it to agile shops in the 2010s. Token counts are doing it to AI teams right now.

The Meter Trap has three moving parts. First, the metric is visible and simple. Second, the metric is tied to perceived productivity. Third, nobody builds the feedback loop that connects the metric to actual business outcomes. Without that loop, the meter runs unchecked.

Inside the Recursive Token Loop

The most expensive failure mode in AI engineering today has a name: Recursive Token Loops. These are agentic systems that get stuck cycling through the same subtasks, revalidating outputs, re-querying context, and burning hundreds or thousands of dollars in a single session before any human notices. There is no manager to tell the agent it is wasting money. The meter just runs.

When the unit of billing becomes the unit of performance, teams engineer for the meter instead of the mission. This is Goodhart's Law in its purest form. Lines of code did this to software teams in the 1990s. Token counts are doing it to AI teams right now.· KODA ANALYSIS · MAY 2026

This is a systems problem, not a people problem. Systems problems require systems thinking.

Start with the input side. Engineers stuff prompts with redundant context to "ensure completeness." According to a December 2025 analysis from Latent Space, up to 40% of total tokens in typical RAG workflows qualify as waste tokens, meaning they add no informational value to the model's output. They exist because the system was designed to be thorough, not efficient. Thoroughness looks good in a demo. Efficiency is invisible.

Now look at the output side. Frontier models like Claude 3.5 Sonnet, which scores 87.6% on SWE-bench Verified as of April 2026, produce responses 3 to 4 times longer than efficiency-tier models like Llama 3. Engineers select frontier models for stakeholder demos because verbose outputs feel smarter. Same task. Four times the cost. No measurable difference in the quality of the merged pull request.

The organizational dynamics compound the technical waste. PYMNTS reported in May 2026 that Amazon employees were using the company's internal AI tool, MeshClaw, to delegate unnecessary tasks to AI agents specifically to inflate their token consumption scores on internal leaderboards. The Financial Times confirmed that some teams had set 80% weekly AI tool usage targets. Engineers gamed the target. They built what you might call "cobra farms," a reference to the colonial-era bounty on cobras in Delhi that caused people to breed cobras for the reward.

This is the most predictable failure in enterprise AI right now. Every system that rewards input volume over output quality eventually produces this exact dysfunction. The fix is never "try harder" or "be more disciplined." The fix is changing what you measure.

The better measurement frameworks already exist. They just require organizations to do the harder work of connecting spend to outcomes.

Cost per Successful PR. This is the metric aiinsightsnews.net calls the 2026 gold standard. Take your total token spend and divide it by the number of merged, production-ready pull requests. Uber's internal benchmark targets $500 per PR. Claude Code hits $720. Codex hits $180. That single number tells you more about engineering productivity than any token dashboard ever will.

ROI per Token. Latent Space's framework is straightforward: (Value Created minus Token Cost) divided by Token Cost. If a chatbot interaction costs $0.01 in tokens and drives $0.10 in revenue, your ROI is 9x. If it costs $0.01 and drives nothing, your ROI is negative. The formula forces teams to define "value created" before they write the first prompt.

The Post-Subsidy Utility Check. This one comes from MEXC's May 2026 AI crypto report and applies far beyond crypto. Ask one question: would anyone use this system if the incentives disappeared? MEXC found that 70% of AI agent token activity was subsidized, meaning usage collapsed once rewards were removed. The same logic applies to enterprise AI tools. If engineers would not use the tool without the leaderboard pressure, the tool is not delivering value. It is delivering compliance.

Model Routing as a Structural Guardrail. Route 80% of tasks to efficiency-tier models at sub-$5 per million tokens. Reserve frontier models for the 10 to 20% of tasks that genuinely require them. Latent Space estimates that prompt caching and semantic compression alone can cut bills by 75 to 90%. The 80/20 principle is not a suggestion here. It is an engineering architecture decision.

Whether most organizations will adopt these frameworks voluntarily or whether it will take a few spectacular budget blowouts to force the shift remains an open question. My read: budget pain is the only reliable teacher. The companies that build outcome dashboards now will look like geniuses in 12 months. The rest will learn the hard way.

2031

Three signals inside the same shift

RECURSIVE WASTE

40%

Up to 40% of tokens in RAG workflows add zero informational value.

Latent Space found that engineers stuff prompts with redundant context to ensure completeness, not efficiency. Frontier models then produce responses 3 to 4 times longer than efficiency-tier alternatives. The result is compounding cost with no measurable quality improvement in merged pull requests.

COBRA FARMING

70%

70% of AI agent token activity collapses when subsidies disappear.

MEXC's May 2026 report found that most AI agent usage was incentive-driven, not value-driven. Amazon employees were reportedly inflating token consumption scores on internal leaderboards. Some teams set 80% weekly AI tool usage targets, creating the exact perverse incentive Goodhart's Law predicts.

OUTCOME PRICING

2031

Per-task pricing will replace per-token billing for most enterprise contracts by 2031.

OpenAI was already piloting per-task models in early 2026. Observability platforms like Faros AI and LangSmith are evolving to track cost-per-outcome rather than cost-per-call. Teams that build value-measurement flywheels now will compound a five-year efficiency advantage that late adopters cannot close quickly.

Zoom out five years and the token pricing model itself may look like a transitional artifact. Like per-minute long distance charges or per-text-message billing, token pricing exists because it maps neatly onto infrastructure costs. It does not map onto value delivered.

The asymmetric advantage belongs to companies that decouple their internal measurement systems from their vendors' billing units. Nvidia's near-bankruptcy in the late 1990s offers an instructive parallel. The company survived not by optimizing for the metrics Wall Street cared about at the time, but by building a flywheel around a capability (GPU compute) that the market had not yet learned to price correctly. The companies that build value-measurement flywheels around AI today are playing the same long game.

By 2031, I expect three structural shifts. First, per-task and per-outcome pricing will replace per-token pricing for most enterprise contracts. OpenAI was already piloting per-task models in early 2026. Second, observability platforms like Faros AI and LangSmith will become as standard as APM tools are today, tracking cost-per-outcome rather than cost-per-call. Third, the engineering teams that learned to measure value in 2026 will have compounded that discipline into a 5-year efficiency advantage that late adopters cannot close quickly.

The Meter Trap is not an AI problem. It is a management problem wearing AI's clothes. The organizations that recognize this will treat token metrics as a diagnostic input, not a performance target. Salary buys furniture. Equity buys your future. The metric you optimize for determines which one you are building.

Storyblok's December 2025 survey found that 90% of companies plan to increase AI investment in 2026. The question is not whether they will spend more. The question is whether they will measure what matters. The compounding difference between those two paths is enormous.

What to Build This Weekend

You do not need a procurement overhaul to start measuring better. You need one dashboard and one afternoon.

Step 1: Pick one AI workflow your team runs daily. Code generation, content drafting, customer support summarization. Just one.

Step 2: Calculate your Cost per Successful Output. Total token spend on that workflow this week divided by the number of outputs that actually shipped, got merged, or reached a customer. If you use a unified API gateway like OfoxAI, pull the spend data from there. It routes requests across GPT, Claude, and other models, so you can see cost breakdowns by provider in one place.

Step 3: Set a token budget ceiling for that workflow. Not a target. A ceiling. The difference matters. A target says "hit this number." A ceiling says "do not exceed this number." Ceilings discourage inflation. Targets encourage it.

Step 4: Route 80% of tasks to your cheapest viable model. Test whether the output quality drops meaningfully. In most cases, it will not. If you are generating marketing copy, try running it through a mid-tier model first. Tools like SodaMarketing can turn a product URL into a video ad without requiring frontier-model inference at all. The output is the asset. The model tier is just a cost lever.

Step 5: Share your Cost per Successful Output number with your team on Monday. Not the token count. Not the throughput. The cost per thing that actually mattered. That single reframe changes the conversation.

Things will break. Your first ceiling will be wrong. Your routing rules will send a complex task to a cheap model and the output will be garbage. That is fine. The point is not perfection on day one. The point is building the muscle of measuring outcomes instead of inputs. Get your reps in. One workflow. One dashboard. One week of data. Then iterate.

The Meter Trap only works when nobody looks past the meter. Start looking.

DOJO · BUILD THIS WEEKEND

Build your first outcome dashboard in one afternoon.

Pick one daily AI workflow. Code generation, content drafting, or support summarization. Do not try to measure everything. Measure one thing correctly first.
Calculate your Cost per Successful Output. Divide total token spend on that workflow this week by the number of outputs that actually shipped, got merged, or reached a customer. If you use a unified API gateway, pull spend data by provider in one view.
Set a token budget ceiling, not a target. A ceiling caps waste without incentivizing consumption. Route 80% of tasks to efficiency-tier models at sub-$5 per million tokens and reserve frontier models for the 10 to 20% of tasks that genuinely require them.

THE BOTTOM LINE

The metric you optimize for determines whether you are building furniture or building a future.

The Meter Trap is not an AI problem. It is a management problem wearing AI's clothes. Ninety percent of companies plan to increase AI investment in 2026, but the compounding difference between measuring tokens and measuring outcomes is enormous. Organizations that connect spend to shipped value now will look like geniuses in 12 months. The rest will learn the hard way that budget pain is the only reliable teacher.

The Meter Trap: Why AI Teams Are Optimizing for Tokens
Instead of Value

The Meter Trap

Four numbers that expose the AI measurement crisis.

Inside the Recursive Token Loop

2031

Three signals inside the same shift

Up to 40% of tokens in RAG workflows add zero informational value.

70% of AI agent token activity collapses when subsidies disappear.

Per-task pricing will replace per-token billing for most enterprise contracts by 2031.

What to Build This Weekend

Build your first outcome dashboard in one afternoon.

The metric you optimize for determines whether you are building furniture or building a future.

Want this every morning?