Cursor 3 ships parallel agents and a 39% PR uplift. METR's randomized trial finds the same tools make experienced developers 19% slower. Both are true. Here's why that matters more than either headline alone.
Signal: Cursor crossed the line from editor to operating system
Cursor 3 launched on April 2, 2026 with an interface rebuilt from scratch. The classic IDE pane is gone. In its place is the Agents Window, a unified workspace where local agents, cloud agents, and agents kicked off from Slack, GitHub, and Linear all sit in one sidebar. You can run multiple agent sessions side-by-side or in a grid. You can hand a long-running task to a cloud agent and pull the result back. You can run the same prompt across two or three frontier models and apply the strongest answer.
The same tools boost throughput and slow individuals.
The numbers Cursor publishes are striking. Inside their own engineering team, 35% of merged pull requests are now written by autonomous agents. Across customer organizations, a University of Chicago study by Suproteem Sarkar found that companies merge 39% more PRs once Cursor's agents become the default. Cursor's CEO calls this the "third era" of software development, where breaking down problems and reviewing artifacts replaces typing code as the primary work. The framing is bold and the data is not made up.
If you build for a living, you have probably already felt the shift. Tab completions used to be 2.5 times more common than agent runs across the user base. Today that ratio has inverted. Twice as many people now run autonomous agents as use Tab. The mental model is moving from "AI suggests, you decide" to "you scope, agents execute, you review."
Build: The METR study tells a quieter story, with sharper teeth
While Cursor was rolling out 3.0, a research team at METR ran a randomized controlled trial on 16 senior developers working on real open-source issues. Half the tasks ran with Cursor Pro and Claude Sonnet 4.5. Half ran without. The sample is small but the methodology is rigorous, and the result lands hard.
Before the study, the developers predicted AI would make them 24% faster. After the study, they reported feeling 20% faster. Their actual time-on-task showed they were 19% slower. The gap between perception and reality landed at 39 percentage points, almost exactly the size of Cursor's published productivity gain.
The time breakdown explains the rest. Coding time dropped, exactly as you would expect when an agent writes the first draft. But two new categories of time grew, and they grew more than coding shrank. Reviewing agent output ate the largest slice. Prompting and waiting on the agent ate the next. IDE overhead ate a smaller but persistent slice. Net effect: 19% slower per task, with no detectable change in code quality and no detectable drop in revert rate.
Outside the lab, GitClear's analysis of public repositories tells a complementary story. Code churn, the rate at which freshly merged code gets reverted or rewritten, climbed from a 3.3% baseline in 2021 to between 5.7% and 7.1% in 2024-2025. More code is shipping. More of it is also coming back. The METR finding that only 39% of agent generations were accepted without rework fits that pattern.
This is not an indictment of Cursor 3 or of agent-driven development. The 39% PR uplift is real. So is the 19% slowdown. They measure different things on different time horizons. Cursor's number measures organizational throughput. METR's measures individual time on a single task. Both can be true at once, and together they describe a tool that is changing what work counts as work.
Unlock: One developer was 38% faster. The skill curve is the lesson.
Three signals inside the same shift
Organizations merge significantly more PRs with agent-default workflows.
A University of Chicago study found companies merge 39% more pull requests once Cursor's agents become the default. Inside Cursor's own team, 35% of merged PRs are now written by autonomous agents. The ratio of agent runs to tab completions has inverted across the user base.
Experienced developers are measurably slower on individual tasks.
METR's randomized trial found senior developers took 19% longer per task with AI agents. Coding time dropped as expected, but reviewing agent output and prompting overhead grew more than coding shrank. Only 39% of agent generations were accepted without rework.
One developer with 50 hours of practice broke the average entirely.
Inside METR's 16-person study, a single developer with 50 hours of prior Cursor usage was 38% faster than baseline. The other 15 were slower. METR's authors were explicit: the learning curve is high enough that early adoption reduces measured performance while developers climb it.
Inside METR's 16-person study, one developer broke the average. They were 38% faster than baseline, not 19% slower. Their distinguishing feature was 50 hours of prior Cursor usage. The other 15 had less. METR's authors were explicit: the learning curve for agent-driven development is high enough that asking people to bake it into existing workflows reduces their measured performance while they climb. Once they reach the top, the gains compound.
If you build software for a living, the takeaway is not "AI agents make you slower" or "AI agents make you faster." It is that AI agents are a separate skill that you have to deliberately practice, the way you once practiced Vim, or test-driven development, or pair programming. Three habits separate the outlier from the average:
- Time-box generations. Decide before you prompt how many minutes of agent runtime are worth less than just writing the code yourself. When the timer hits zero, stop. Most slowdown comes from the developer waiting for an agent to finish a task they could have done in three minutes by hand. - Review like a senior reviewing a junior. Agent code is plausible-looking and frequently wrong in subtle ways. Read it the way you read an unfamiliar contributor's first PR. Look for invented APIs, missing edge cases, and shortcuts that hide a wrong assumption. - Decompose before delegating. The biggest gains come from giving agents tightly scoped, well-specified work, not "build this feature." If you cannot describe the task in one paragraph, the agent cannot execute it cleanly. Spend that paragraph before you spend the prompt.
The reason this works is not magic. It is the same reason senior engineers ship more than junior engineers. The bottleneck moves from typing to thinking, and thinking is where compounding lives.
Bottom line
Cursor 3 is the most ambitious dev tool release of the year, and the numbers Cursor publishes are honest. The METR numbers are also honest. The gap between them is not a contradiction. It is a snapshot of a transition that costs something to learn and pays back once you do.
Builders who treat AI agents as a managed team outperform builders who treat them as magic. The tool is real. The skill is real. Both compound.
Practice the agent skill before it practices on you.
- Time-box every agent generation. Before you prompt, set a hard limit on how many minutes of agent runtime justify the wait. If you could write the code in three minutes by hand, do not spend five watching an agent try. Track your cutoff times in a simple log for one week to calibrate your instincts.
- Review agent output like a senior reviewing a junior's first PR. Read every generated block for invented APIs, missing edge cases, and shortcuts hiding wrong assumptions. Flag each issue in comments as if you were onboarding a new contributor. This habit builds the review muscle that separates the 38% outlier from the 19% slowdown.
- Decompose tasks into one-paragraph specs before delegating. Open a scratch file and write a single paragraph describing the exact scope, inputs, outputs, and constraints of the work. If you cannot fit it in one paragraph, split the task. Hand the paragraph to the agent as the prompt. Tightly scoped work is where agent gains compound.
The tool is real. The skill is real. Both compound.
Cursor 3 is the most ambitious dev tool release of the year, and its 39% PR uplift is backed by independent research. METR's 19% slowdown is equally real, measured on the same class of tools with rigorous methodology. The gap is not a contradiction. It is a learning curve with a steep cost and a compounding payoff. Builders who treat AI agents as a managed team, not as magic, are the ones who will land on the right side of both numbers.