K Koda Intelligence
exploreDeep Dive

Google's Cheapest Model Just Beat Its Most Expensive One.
That Changes Everything About Pricing

Gemini 3.5 Flash launched at Google I/O on May 20 as a generally available model, scoring 76.2% on Terminal-Bench 2.1 and outperforming Gemini 3.1 Pro on most coding and agentic benchmarks. With a 1M token context window and distribution across billions of Google surfaces, the "budget" tier just collapsed the frontier pricing architecture.

7 MIN READ · BY THE KODA EDITORIAL TEAM · STRATEGY · AI MODEL ECONOMICS
headphones
LISTEN TO THE DEEP DIVE~2 min conversation
smart_display
WATCH THE VISUAL NARRATIVEAnimated breakdown · ~2 min
play_arrow
Play · YouTube
TERMINAL-BENCH 2.176.2%↑ GOOGLE I/O CONTEXT WINDOW1M TOKEN· GEMINI 3.5 FLASH GDPVAL-AA ELO1656↑ BENCHMARK LAUNCH DATEMAY 20· GENERALLY AVAILABLE VS GEMINI 3.1 PROBEATS↑ KEY BENCHMARKS GEMINI 3.5 PRONEXT MONTH· EXPECTED ROLLOUT ALONGSIDE FLASHGEMINI SPARK· GOOGLE ANNOUNCEMENT TERMINAL-BENCH 2.176.2%↑ GOOGLE I/O CONTEXT WINDOW1M TOKEN· GEMINI 3.5 FLASH GDPVAL-AA ELO1656↑ BENCHMARK LAUNCH DATEMAY 20· GENERALLY AVAILABLE VS GEMINI 3.1 PROBEATS↑ KEY BENCHMARKS GEMINI 3.5 PRONEXT MONTH· EXPECTED ROLLOUT ALONGSIDE FLASHGEMINI SPARK· GOOGLE ANNOUNCEMENT

Google just made its cheapest model better than its most expensive one. On May 19, 2026, Gemini 3.5 Flash launched at Google I/O and outperformed Gemini 3.1 Pro on most coding and agentic benchmarks. It runs roughly 4x faster than comparable frontier models. And Google immediately made it the default engine behind Search, the Gemini app, and its new Antigravity coding platform.

That is not a product announcement. That is a pricing architecture collapsing in on itself.

Here is what it means for every builder, buyer, and strategist watching the AI model market.

The Compression Principle

A pattern keeps repeating across every generation of AI models. Call it the Compression Principle: the capabilities that cost $100 today will cost $30 in 12 months and $5 in 24 months. But the companies that win are not the ones who wait for the price to drop. They are the ones who lock in distribution while the compression is happening.

BENCHMARK LEDGER · MAY 2026GOOGLE I/O · TERMINAL-BENCH · GDPVAL-AA · MCP ATLAS

Gemini 3.5 Flash vs. the field on agentic and coding benchmarks.

Terminal-Bench 2.1 Gemini 3.5 Flash · agentic coding
76.2%
GDPval-AA Elo Gemini 3.5 Flash · agentic alignment
1656
MCP Atlas Gemini 3.5 Flash · multi-step tool use
83.6%
Gemini 3.1 Pro (T-Bench) Previous gen · baseline comparison
70.3%

Google just executed this playbook in public. Gemini 3.5 Flash scores 83.6% on MCP Atlas, a multi-step agentic workflow benchmark. That beats Gemini 3.1 Pro (78.2%), Claude Opus 4.7 (79.1%), and GPT-5.5 (75.3%). On Terminal-Bench 2.1, Flash hits 76.2% compared to 3.1 Pro's 70.3%. The "cheap" model now outperforms the "smart" model on the tasks that matter most for agents and code.

The framework is simple. Intelligence compresses downward. Distribution compounds upward. The company that pushes frontier capability into the cheapest, fastest tier and then pipes it through billions of existing touchpoints creates a flywheel that competitors cannot replicate with model quality alone.

Google has over 4 billion Search users, more than 3 billion Android devices, and roughly 2 billion Gmail accounts. Gemini 3.5 Flash is now the default model behind AI Mode in Search and the Gemini app globally. That is not a beta. That is distribution at planetary scale, running on a model that costs roughly a third of what competitors charge for similar performance.

The Real Bet Behind Flash: Agents Over Answers

Here is where the strategic picture gets interesting. Google did not optimize 3.5 Flash to be a better chatbot. They optimized it to be a better worker.

Intelligence compresses downward. Distribution compounds upward. The company that pushes frontier capability into the cheapest, fastest tier and then pipes it through billions of existing touchpoints creates a flywheel that competitors cannot replicate with model quality alone.· KODA EDITORIAL ANALYSIS · MAY 2026

Look at where the benchmark gains concentrate. MCP Atlas measures multi-step tool use across the Model Context Protocol. Terminal-Bench 2.1 measures agentic terminal coding. GDPval-AA measures agentic alignment. Google reported a 42% improvement over Gemini 3 Flash on long-range, multi-turn cybersecurity task sets. Every major gain points in one direction: sustained, autonomous execution across multiple steps.

This is a counterpositioning move. While OpenAI and Anthropic compete on reasoning depth and dense context fidelity, Google is betting that the primary commercial value of AI shifts from "answering questions" to "completing workflows." The distinction matters enormously.

Consider the products launched alongside Flash. Gemini Spark is a personal AI agent that works 24/7, even when your laptop is closed. It ships to Google AI Ultra subscribers in the U.S. within the week. Universal Cart uses Gemini plus a new Universal Commerce Protocol to pull products from across the web, track deals, and make recommendations through Gmail, YouTube, and the Gemini app. Antigravity 2.0 is Google's coding agent platform, described as their answer to Copilot, Codex, and Claude Code combined, and it claims 12x faster performance than the previous version.

Every one of these products requires a model that is fast, cheap, and reliable across many sequential tool calls. That is exactly what 3.5 Flash is built for.

But here is the honest hedge. It is unclear whether agentic benchmark performance translates to real-world reliability at scale. A model can score well on scripted multi-step tasks and still fall apart on ambiguous instructions, adversarial tool outputs, or brittle UI environments. "More capable" can also mean "more confident in wrong plans," which is a genuinely dangerous trait for agents operating autonomously. The gap between benchmark wins and production trust remains wide.

There is also a pricing contradiction worth sitting with. Flash used to mean cheap. Gemini 3.5 Flash reportedly costs $1.50 per million input tokens and $9 per million output tokens, compared to $0.50 and $3 for the previous Gemini 3 Flash. That is a 3x price increase on the "budget" tier. Sundar Pichai framed it as "less than half the price" of other frontier models, which is true when you compare it to GPT-5 class or Claude Opus class pricing. But for teams that built their cost models around Flash-tier economics, this is a meaningful jump.

My read on this: Google is deliberately blurring the line between Flash and Pro. If 3.5 Flash beats 3.1 Pro on most benchmarks, and 3.5 Pro is coming next month, the question becomes what "Pro" even means anymore. I think Google is moving toward a world where Flash is the production workhorse for 90% of use cases and Pro becomes a specialized tool for the hardest reasoning and longest context windows. That segmentation only works if the boundaries are clear and stable. Google's history with product lines does not inspire confidence on that front.

The reasoning benchmarks tell the other side of the story. On ARC-AGI-2, it hits 72.1% compared to 77.1%. Dense long-context recall at 128k tokens also reportedly trails Pro. These are not trivial gaps for teams building applications that require deep analytical reasoning or faithful recall across massive documents.

The asymmetric bet Google is making: most commercial value lives in speed and tool use, not in the last 5% of reasoning depth. That bet could be exactly right. Or it could leave an opening for Anthropic and OpenAI to own the "trust layer" of AI, the applications where getting the answer wrong carries real consequences.

2031

Three signals inside the same shift

PRICING COLLAPSE

The Flash tier got 3x more expensive but still undercuts the frontier.

Gemini 3.5 Flash costs $1.50 per million input tokens, a 3x jump from the previous Flash generation. But Google frames it as less than half the price of GPT-5 class or Claude Opus class models. The "budget" label now means something entirely different.

AGENTS OVER ANSWERS
76.2%

Every major benchmark gain points toward sustained autonomous execution.

Terminal-Bench 2.1 at 76.2%, MCP Atlas at 83.6%, and a 42% improvement on long-range cybersecurity task sets all signal the same bet. Google optimized Flash to complete workflows, not answer questions. Products like Gemini Spark and Universal Cart are built on this foundation.

DISTRIBUTION FLYWHEEL
1M

A 1M token context window deployed across 4 billion Search users.

Gemini 3.5 Flash is now the default engine behind AI Mode in Search and the Gemini app globally. Google processes over 3 trillion tokens per day internally with this model. That volume creates a data flywheel and integration surface that compounds with each quarter.

Zoom out five years. The Compression Principle does not stop at one generation. If Flash-tier models already match last year's Pro-tier models, then by 2028 or 2029, today's frontier intelligence will run on devices, in browsers, at the edge. The cost per token approaches zero for most tasks.

The companies that win in 2031 are not the ones with the best model. They are the ones with the deepest distribution and the stickiest agent ecosystems. This is why Google's move matters more than any single benchmark number.

Google is processing more than 3 trillion tokens per day internally with 3.5 Flash. That is roughly 34.7 million tokens per second, sustained. That volume creates a data flywheel for improving the model, a cost curve that drops with scale, and an integration surface that becomes harder to rip out with each passing quarter.

The parallel is Costco's $1.50 hot dog. You do not make money on the hot dog. You make money on the membership and the basket. Google does not need to make money on Gemini tokens directly. They need Gemini to make Search stickier, Workspace indispensable, and Cloud the default platform for agent deployment. The $100 per month AI Ultra plan and the $200 Enterprise tier are the membership fees. The model is the hot dog.

The risk for every other AI company is that Google's distribution advantage compounds faster than their model quality advantage. Anthropic can build a safer, more thoughtful reasoning engine. OpenAI can push the frontier on raw intelligence. But if 80% of commercial AI tasks run through Google surfaces by default, the market for standalone model providers narrows considerably.

I think the next 18 months will determine whether the AI market looks like search (one dominant player with a massive distribution moat) or like cloud computing (three to four viable platforms with differentiated strengths). Google's I/O announcements suggest they are betting hard on the first outcome.

What to Build This Weekend

Stop reading about models. Start building with one.

First, open Google AI Studio and get access to Gemini 3.5 Flash through the API. It is generally available as of May 19. The model ID is gemini-3.5-flash. Run a simple multi-step task: give it a prompt that requires two tool calls in sequence. See how it handles the handoff between steps. That will tell you more about agentic reliability than any benchmark table.

Second, test the speed claim yourself. Generate a 2,000 token response and time it. Independent testers reported roughly 300 tokens per second under good conditions. Compare that to whatever model you currently use in production. If the speed difference is real for your workload, recalculate your cost model with the new pricing ($1.50 per million input, $9 per million output).

Third, if you teach or package knowledge, try Honen v1.2 this week. Upload your rough notes or expertise on any topic and let it structure a course outline. Pair that with Gemini 3.5 Flash through the API to generate supporting content. You do not need a CS degree. You need one afternoon and a willingness to break things.

Fourth, build one tiny agent. Use the Gemini API's function calling capability to connect Flash to a single external tool: a calendar, a spreadsheet, a database. Make it do one useful thing autonomously. That is the skill that matters now. Not prompting. Not chatting. Orchestrating.

The models will keep getting cheaper and faster. The builders who learn to wire them into real workflows this month will have a compounding advantage over everyone who waits for the next announcement.

DOJO · BUILD THIS WEEKEND

Stop reading about models. Start building with one.

  1. Test agentic reliability firsthand. Open Google AI Studio, access Gemini 3.5 Flash via the API (model ID: gemini-3.5-flash), and run a prompt requiring two sequential tool calls. Observe how it handles the handoff between steps. That tells you more than any benchmark table.
  2. Benchmark the speed claim against your workload. Generate a 2,000 token response and time it. Independent testers reported roughly 300 tokens per second. Compare that to your current production model and recalculate your cost model with the new pricing: $1.50 per million input, $9 per million output.
  3. Build one agent loop before Monday. Pick a repetitive workflow you do weekly, such as summarizing emails, triaging tickets, or drafting outreach. Wire Gemini 3.5 Flash into a simple script that executes three steps autonomously. Ship ugly. Learn where it breaks.
THE BOTTOM LINE

The model is the hot dog. Distribution is the membership.

Google just made its cheapest model outperform its most expensive one, then deployed it across billions of touchpoints in a single day. The Compression Principle means today's frontier intelligence becomes tomorrow's commodity tier. The companies that win are not the ones with the best model but the ones with the deepest distribution and the stickiest agent ecosystems. The next 18 months will determine whether the AI market consolidates like search or diversifies like cloud. Google is betting hard on the first outcome.

Want this every morning?

AI analysis, world news, markets, and tools. One briefing, delivered free.

One email per day. No spam. Unsubscribe anytime.