The Great Inference Land Grab

A chip startup nobody had heard of just booked more customer contracts than the equity it raised. Etched came out of stealth on June 30, 2026 with $800 million raised, a working chip, and over $1 billion in signed customer contracts. The chip does not train AI models. It only runs them.

Jane Street put in more than $100 million. Hudson River Trading, Two Sigma, and Jump Trading joined too. These are the firms that buy inference silicon by the rack. When market makers who profit off microseconds put their own money into a chip, that tells you something. I think it tells you the AI compute market is fragmenting faster than the sell-side models assumed.

Here is why that matters for every builder, investor, and founder betting on where AI costs go next.

The Serving Split

For three years the whole AI story was about training. Bigger clusters. More GPUs. Nvidia sold the shovels and everyone paid full price. That was the arc most forecasts baked in through the late 2020s.

COMPUTE MARKETS · JUNE 2026ETCHED · JANE STREET · FRACTILE · GOOGLE

The capital pointed at inference-only silicon before any benchmark shipped.

Etched equity raised Etched · out of stealth June 30

$800M

Signed customer contracts Etched · before broad shipment

$1B

Jane Street check Lead investor · market maker

$100M

Fractile inference round Compute-memory co-location

$220M

But training is a one-time cost. Serving a model happens every single time a user hits enter. Call this the Serving Split: the moment inference breaks away from training and becomes its own market with its own hardware, its own vendors, and its own economics.

Etched is the clearest proof of the split so far. Its chip, called Sohu, bakes the transformer architecture directly into silicon. It does one thing. It runs models. It does not pretend to also train them.

That narrowness is the whole point. A general-purpose GPU wastes silicon on flexibility you do not need at inference time. Sohu trades flexibility for throughput. When your cost center is serving, not building, that trade can flip the economics.

Why Sophisticated Buyers Signed Before the Benchmarks

The most interesting number is not the $800 million. It is the $1 billion in contracts signed before broad commercial shipment. Buyers committed capital before any independent benchmark existed.

The most interesting number is not the $800 million. It is the $1 billion in contracts signed before broad commercial shipment. Buyers committed capital before any independent benchmark existed.· KODA EDITORIAL · JUNE 2026

No third-party organization has published production throughput numbers for Sohu yet. No MLPerf results. No standardized dollars-per-token comparison. The company achieved first-pass A0 silicon on TSMC's N4P process, which is real, but real is not the same as proven at scale.

So why did people sign? My read is that this is a scarcity signal, not a quality signal. When compute is this constrained, whoever can ship working inference racks wins revenue right now.

I want to be honest about the risk here. Signing contracts before proof at scale is a bet, not a verdict. The contracts were not customer-named. They may carry milestones and contingencies that never convert to full revenue if yields, thermals, or timelines slip.

There is a bigger structural risk too. Etched is betting that transformer inference stays stable enough to justify fixed-function silicon. If frontier models drift toward state-space models or hybrid attention variants, a chip baked for transformers cannot patch its way out. You respin the silicon, which is slow and expensive.

Then there is Nvidia's real moat, which is not the chip. It is CUDA, the libraries, the tooling, and the millions of developers who already know the stack. Whether enterprises will accept the integration pain of a new architecture to save on dollars-per-token is unclear. Many will pay a premium to avoid fragmentation.

Watch the counterpositioning here. Etched is not attacking Nvidia head-on across the whole stack. It picked the one workload, inference, where specialization can beat generality, and it built the full rack around it: boards, cooling, networking. That is a systems business, not just a chip business. The nicher the target, the faster the wedge.

The pattern extends past Etched. Google announced in April 2026 that a version of its own chips would focus specifically on inference. Fractile raised $220 million for inference chips that co-locate compute and memory.

That is a lot of capital pointed at the same thesis. When the incumbent itself pays $20 billion to license an inference-optimized architecture, the market is telling you the serving battleground is real.

2031

Three signals inside the same shift

THE SERVING SPLIT

$1B

Inference is breaking away from training as its own market.

Etched's Sohu bakes the transformer architecture directly into silicon and does nothing but run models. Training is a one-time cost, but serving happens every time a user hits enter. That recurring dollar is where a specialized chip can flip the economics.

SCARCITY SIGNAL

N4P

Buyers signed before any independent benchmark existed.

No MLPerf results, no standardized dollars-per-token comparison, only first-pass A0 silicon on TSMC's N4P process. My read is that this is a scarcity signal, not a quality signal. When compute is this constrained, whoever ships working inference racks wins revenue now.

MOAT RISK

$20B

Nvidia's real moat is CUDA, not the chip.

The tooling and millions of developers who know the stack are the barrier. Etched also bets transformer inference stays stable enough for fixed-function silicon. If frontier models drift to state-space or hybrid attention, a baked chip cannot patch out. You respin, which is slow and expensive.

Pull back five years. The question is not whether Etched wins. The question is what the split does to compute pricing.

For most of AI's short history, compute was treated as uniformly scarce and effectively GPU-only. That assumption is what let one vendor hold pricing power across the entire stack. The Serving Split threatens that assumption at its most valuable point, because inference is where the recurring dollars live.

Here is the asymmetric bet. If specialized inference silicon captures even a slice of serving demand, buyers gain the ability to dual-source. Dual-sourcing erodes pricing power. Eroded pricing power compounds into lower cost-per-token across the industry.

The contrarian case is equally sharp. More capital chasing a bottleneck can increase Nvidia's pricing power in the short term, not decrease it. If Etched stumbles on yield or Nvidia closes the efficiency gap with each generation, this episode reads as proof of how hard the moat is to breach, not evidence it is gone. The data is mixed on which way this resolves.

My honest view sits in the middle. I do not think inference is fully commoditized in 2026. I do think the machinery of commoditization is now visibly assembling, and faster than most forecasts penciled in. That planning horizon only makes sense if large buyers intend to multi-source inference at scale.

Only shipped chips are real. The rest is contracts and slides. But the direction of travel is hard to unsee once you notice who is funding it.

What to Build This Weekend

You cannot buy a Sohu chip this weekend. But you can build the muscle that matters when compute gets cheaper: shipping products that run on inference, not products that need to train anything.

Inference means running an already-trained model. Training means building one from scratch. You want to live entirely in the first category, because that is where costs are about to fall.

First, pick one narrow use case. The nicher you go, the faster you go. A tool that summarizes one specific type of document beats a general assistant nobody remembers.

Second, build the front end fast. Open bolt.new, describe your app in plain language, and let it generate a working full-stack-style starting point. You are not writing infrastructure. You are wrapping a model in a useful interface.

Third, wire in an existing model API for the actual work. If you are summarizing dense research, SciSummary already does the heavy lifting on scientific text. Study how it turns walls of text into clean output, then copy the pattern for your niche.

Fourth, test until it breaks, then fix it. Things will break. That is normal, not failure. Feed it messy inputs and watch where it stumbles.

You do not need a CS degree for any of this. The chips are getting specialized so serving gets cheap. Your job is to be ready with something worth serving when it does. Get your reps in this weekend.

DOJO · BUILD THIS WEEKEND

Ship something worth serving before compute gets cheap.

Pick one narrow use case. The nicher you go, the faster you go. A tool that summarizes one specific type of document beats a general assistant nobody remembers.
Build the front end fast. Open bolt.new, describe your app in plain language, and let it generate a working full-stack-style starting point. You are wrapping a model in a useful interface, not writing infrastructure.
Wire in an existing model API. For dense research, study how SciSummary turns walls of text into clean output, then copy the pattern for your niche. Feed it messy inputs, watch where it stumbles, and fix it until it holds.

THE BOTTOM LINE

The machinery of commoditization is now visibly assembling.

Inference is not fully commoditized in 2026, but the direction of travel is hard to unsee once you notice who is funding it. If specialized silicon captures even a slice of serving demand, buyers gain the ability to dual-source, and eroded pricing power compounds into lower cost-per-token across the industry. The contrarian case is equally sharp: more capital chasing a bottleneck can lift pricing power in the short term, and a yield stumble reads as proof of how hard the moat is to breach. Only shipped chips are real, and the rest is contracts and slides. Your job is to be ready with something worth serving when the cost falls.

A chip nobody knew just booked more contracts than it raised