The Arms Race Flipped

Google's Threat Intelligence Group just confirmed the first zero-day exploit it believes was developed with AI assistance. A prominent cybercrime group built a Python script that bypasses two-factor authentication on a popular open-source admin tool. Google caught it before mass exploitation began. The exploit carried hallmarks of machine generation: hallucinated CVSS scores in the comments, textbook docstrings, and formatting so clean it practically screamed "LLM output." Zero-day exploits like this one historically sell for $500,000 to $2.5 million on black markets. Now an AI model can help produce one for the cost of an API call.

I think this is the most important cybersecurity story of 2025 so far. Not because one exploit changed everything overnight. But because it proves a threshold has been crossed. AI is no longer just writing phishing emails. It is finding logic flaws in authentication code and turning them into weapons. Both sides of the arms race are now running on the same engine.

Here is the framework for understanding what this means, what to watch, and what to build right now.

The Compression Principle

The core insight fits on an index card. AI compresses the time between discovering a flaw, weaponizing it, and deploying it at scale. Call it The Compression Principle.

COMPRESSION PRINCIPLE · MAY 2025GOOGLE GTIG · IBM COST OF DATA BREACH 2024 · GARTNER

The numbers behind AI's new role in the exploit pipeline.

Breach containment with AI IBM 2024 · days to contain

214 days

Breach containment without AI IBM 2024 · days to contain

308 days

Zero-day black market range Google GTIG · exploit pricing

$0.5-2.5M

Avg breach cost (no automation) IBM 2024 · per incident

$5.36M

Before AI entered the exploit pipeline, the chain looked like this. A skilled researcher spends weeks or months reading source code. They find a subtle logic flaw. They write a proof of concept. They test it. They package it for deployment. Each step required deep expertise and serious time.

AI collapses those steps. According to Google's report, North Korean group APT45 sent "thousands of repetitive prompts" to analyze CVEs and validate proof-of-concept exploits. China-linked actors deployed agentic tools called Strix and Hexstrike against a Japanese tech firm. The pattern is consistent: AI handles the tedious middle work of reading documentation, generating variants, and validating attack paths.

The Compression Principle works on defense too. Google's Big Sleep agent, built by DeepMind and Project Zero, found the same 2FA bypass flaw before the criminals could deploy it. The IBM Cost of a Data Breach Report 2024 found that organizations with extensive AI and automation in their security stack contained breaches in 214 days versus 308 days without it. That 94-day gap is The Compression Principle in action, on the good side.

The strategic question is not whether AI helps attackers or defenders. It helps both. The question is which side compresses faster. Right now, it's unclear whether most organizations can keep pace.

The 500 IQ Intern Just Learned to Pick Locks

Let me translate what happened here into plain language. Think of a frontier AI model as a 500 IQ intern. It reads fast. It follows instructions. It never gets tired. Until recently, that intern was mostly useful for writing emails, summarizing documents, and generating boilerplate code. Helpful but not dangerous.

This is probably the tip of the iceberg. If one cybercrime group figured this out, others already have. Attackers are shopping around, finding whichever model gives them the best results with the fewest guardrails.· JOHN HULTQUIST, CHIEF ANALYST AT GOOGLE GTIG · VIA CYBERSCOOP 2025

Now that intern can read the entire source code of an open-source admin tool, understand the authentication flow, spot a place where the logic says "trust this session without checking 2FA," and write a working exploit script. That is a qualitative jump. The vulnerability Google found was not a buffer overflow or a memory corruption bug. It was a semantic logic flaw, a place where the code was functionally correct but security incorrect. Traditional scanners and fuzzers don't catch these. They look for crashes and malformed inputs. A logic flaw looks like normal code doing exactly what it was told to do, except what it was told to do is wrong.

This is the 20% insight that matters more than the other 80% of this story. Most security tooling is optimized for the wrong class of bug. Signature-based detection, input fuzzing, static analysis for known vulnerability patterns: these are Tractors. Ugly, functional, built for a world where exploits came from memory corruption. The new threat is a Ferrari with an engine. It looks like clean Python. It reads like a tutorial. And it bypasses your 2FA.

Google's GTIG report says the exploit contained "an abundance of educational docstrings" and "a structured, textbook Pythonic format highly characteristic of LLMs training data." John Hultquist, chief analyst at GTIG, told CyberScoop this is "probably the tip of the iceberg." My read: he is right. If one cybercrime group figured this out, others already have. Google explicitly said the model used was not Gemini and likely not Anthropic's offering. That means attackers are shopping around, finding whichever model gives them the best results with the fewest guardrails.

Here is the practical workflow shift security teams need to make. Stop treating logic flaws as edge cases. They are now first-class exploit targets. Map every authentication flow in your stack. Ask one question about each: "Is there any path through this code that skips a required check?" That is exactly the question an AI model can answer, for both attackers and defenders. The nicher you go in your threat model, the faster you find real problems.

Three concrete changes to make this week. First, audit every 2FA implementation for hardcoded trust exceptions. The Google exploit targeted exactly this pattern. Second, restrict internet-facing access to admin tools. If your web-based admin panel is reachable from the public internet, you are running an open invitation. Third, deploy phishing-resistant MFA like FIDO2 or WebAuthn hardware keys. If the attacker needs a physical token they cannot copy through a logic flaw, the entire exploit class becomes harder to execute.

One caveat worth flagging: the evidence that this exploit was AI-generated is strong but circumstantial. Google based its "high confidence" assessment on code style, not on access to model logs. Style is suggestive, not definitive. Humans copy from tutorials and Stack Overflow too. But even if AI only helped with weaponization rather than discovery, The Compression Principle still applies. The gap between finding a flaw and deploying it at scale is where defenders intervene. If AI shrinks that gap, the economics of offense shift permanently.

2030

Three signals inside the same shift

OFFENSE COMPRESSED

94 days

AI shrinks the gap between flaw discovery and weaponization.

Google confirmed APT45 sent thousands of repetitive prompts to analyze CVEs and validate proof-of-concept exploits. China-linked actors deployed agentic tools called Strix and Hexstrike against a Japanese tech firm. The time from vulnerability to weapon is collapsing on the attacker side.

DEFENSE ACCELERATES

214 days

AI-equipped defenders contain breaches 94 days faster.

IBM's 2024 data shows organizations with extensive AI and automation contained breaches in 214 days versus 308 without it. Google's Big Sleep agent, built by DeepMind and Project Zero, found the same 2FA bypass before criminals could deploy it. The loop between detection and response is the new competitive moat.

LOGIC FLAW ERA

$2.5M

Semantic bugs are the new exploit frontier, and scanners miss them.

The zero-day was not a buffer overflow or memory corruption. It was a semantic logic flaw where code was functionally correct but security incorrect. Traditional fuzzers and signature-based tools cannot catch these. AI models can read authentication flows and spot skipped checks, making logic flaws a first-class exploit target.

Pull back five years. Where does this land?

The cybersecurity market already exceeds $200 billion annually according to Gartner estimates. The AI cybersecurity segment is growing at aggressive double-digit rates. By 2030, every serious security operations center will run AI-assisted triage as a baseline, not a differentiator. The asymmetric advantage will belong to whoever builds the tightest feedback loop between detection and response.

Think about what compounding looks like here. Today, AI helps find one logic flaw in one admin tool. In 18 months, AI agents will scan entire open-source dependency trees for authentication inconsistencies across thousands of packages simultaneously. The same capability on defense means tools like Big Sleep and CodeMender will patch flaws before they ever reach production. The race is not about who has AI. Everyone will have AI. The race is about who builds the faster loop.

The deeper strategic shift is about identity, not code. If AI-assisted attackers are targeting 2FA bypass paths today, they will target session management, OAuth flows, and SSO trust chains tomorrow. Security budgets that still allocate 70% to perimeter tools and 30% to identity have it backwards. The compounding bet is on identity security, software supply chain integrity, and secure-by-design engineering. Salary buys firewalls. Equity buys an authentication architecture that AI cannot logic-flaw its way through.

The contrarian risk is overreaction. Governments may impose broad restrictions on AI use in security research, hurting defenders more than attackers. Executives may redirect budgets from basic hygiene (patching, segmentation, backups) toward flashy "AI threat hunting platforms" that sound impressive in board presentations but miss the fundamentals. According to IBM's 2024 data, the average breach still costs $5.36 million for organizations without automation. The unsexy work of patching fast and reducing exposed surfaces still delivers the highest ROI per dollar spent.

The organizations that win in 2030 will not be the ones with the most sophisticated AI. They will be the ones that compressed the loop between "flaw discovered" and "flaw patched" to near zero. That is The Compression Principle applied to defense. And it starts with decisions made this quarter.

What to Build This Weekend

You do not need a security engineering degree to start closing the gap. Here are four things you can do before Monday.

First, run an authentication audit on your own tools. Pick one admin panel or internal tool your team uses. Trace the login flow from credential entry through session creation. Write down every point where a check happens. Then ask: "Is there any path that skips one of these checks?" You can use Claude or ChatGPT to help you read the source code if it is open source. Paste the auth module in and ask it to find trust assumptions. This is exactly what attackers are doing. Do it first.

Second, check your exposure. Use Shodan or Censys to search for your organization's IP ranges. Look for any web-based admin tools accessible from the public internet. If you find one, restrict access to VPN or zero-trust network access immediately. The Google exploit targeted a web-based admin tool. If yours is not internet-facing, this entire attack class becomes irrelevant to you.

Third, try AI Aware from this week's digest. It bundles detection for AI-generated text, deepfake video, synthetic images, and cloned audio into one tool. If AI-generated exploit code is now a reality, AI-generated social engineering is already widespread. Having a detection layer for synthetic content is no longer optional.

Fourth, set up a simple vulnerability monitoring workflow. Osaurus, also from this week's digest, lets you run local and cloud AI models side by side on your Mac. Use it to build a lightweight agent that pulls new CVE disclosures from NIST's National Vulnerability Database and summarizes which ones affect your stack. You do not need a commercial threat intelligence platform to start. You need 20 minutes, an API key, and a willingness to test something that might break on the first try. That is fine. Test aggressively. Fix what breaks. Get your reps in.

The Compression Principle works for you too. The faster you close the loop between learning about a threat and doing something about it, the less any single exploit matters. Start small. Start this weekend.

DOJO · BUILD THIS WEEKEND

Audit your authentication flows before AI does it for you.

Trace every login path in one admin tool. Pick your most exposed internal panel. Map credential entry through session creation. Write down every checkpoint and ask: is there any path that skips a required check? This is exactly the question the AI-generated exploit answered.
Kill hardcoded trust exceptions in 2FA. Search your codebase for session tokens or IP ranges that bypass second-factor verification. The Google exploit targeted exactly this pattern. Remove every exception or replace it with phishing-resistant MFA like FIDO2 or WebAuthn hardware keys.
Restrict admin tool access to internal networks only. If any web-based admin panel is reachable from the public internet, pull it behind a VPN or zero-trust proxy today. Reducing the attack surface is still the highest ROI per dollar spent, even in the age of AI-generated exploits.

THE BOTTOM LINE

The race is not about who has AI. It is about who builds the faster loop.

AI just crossed the threshold from writing phishing emails to finding and weaponizing logic flaws in authentication code. The organizations that win are not the ones with the most sophisticated models. They are the ones that compress the gap between flaw discovered and flaw patched to near zero. That means investing in identity security, software supply chain integrity, and secure-by-design engineering this quarter. The Compression Principle works for both sides. The only question is which side runs the tighter loop.

AI Just Found Its First Zero-Day.
Both Sides Are Armed Now.