The Claude Code Leak Revealed a Token Drain Bug. The Real Problem Is Bigger.

The leaked Claude Code source revealed a caching bug silently burning tokens on every session resume. Here's what it means for AI pricing and your workflow.

Apr 02, 2026

Follow-up to: Anthropic Is Losing Money on You Every Month. What Are You Shipping?

Three weeks ago, I wrote that Anthropic is losing money on every subscriber and that smart developers should ship like crazy before the economics normalize.

I was right about the thesis. I was wrong about the timeline.

The window isn’t closing in 18-24 months. It’s closing now.

What Changed in Three Weeks

Three things happened in rapid succession that accelerated the timeline:

1. Claude subscriptions doubled. Anthropic’s paid user base went from ~30k to ~60k subscribers between January and March 2026. Record growth. The Claude Code launch, Super Bowl buzz, and Cowork tools drove a wave of new signups.

2. Rate limits got brutal. Users on r/ClaudeAI went from “this is amazing” to “I can’t work” practically overnight. Pro users ($20/month) report hitting 10% of their daily quota from a single prompt. Max users ($100-200/month) report the same degradation. One Max 20x subscriber — paying $200/month — couldn’t work for nine consecutive days.

3. The source code leaked. On March 31, 2026, a 59.8 MB source map file was accidentally shipped in the Claude Code npm package. 512,000 lines of TypeScript, mirrored across GitHub within hours. And buried in that code was proof of something users had been complaining about for weeks.

The Token Drain Bug

Here’s what the leak revealed.

Claude Code has a function called db8 that filters what gets saved to session files. For non-Anthropic users, it strips out all attachment-type messages — including deferred_tools_delta records that track which tools the model already knows about.

When you resume a session, Claude Code scans your history to figure out what tools it already announced. But because db8 nuked those records, it finds nothing. So it re-announces every deferred tool from scratch. Every. Single. Resume.

This breaks prompt caching in three ways:

System reminders shift positions in the message array
The billing hash changes because the first message content differs
The cache breakpoint moves because the array length is different

Result: your entire conversation rebuilds as cache_creation tokens instead of hitting cache_read. The longer the conversation, the worse the drain.

One user patched the two-line fix and posted it. His 5-hour usage dropped from spiralling out of control to 6% — normal levels. The post got 367 upvotes. A sharp commenter noted the patch also bypasses billing controls on cache TTL, which makes it not just a bug fix, but let’s set that aside.

Here’s the uncomfortable part: this bug was burning tokens silently for weeks. Users were complaining about rate limits. Anthropic’s status page showed “no incidents.” And the actual cause was a caching bug in their own client code.

The Math Doesn’t Work

Let’s do the numbers.

Anthropic’s annualized revenue is roughly $14 billion. Claude Code alone accounts for $2.5 billion of that run rate — up from $500 million just three months earlier. Consumer subscriptions generated about $1.2 billion in 2025, with 1,000%+ year-over-year growth.

Sounds great, right? Until you look at the other side of the ledger.

Anthropic burned approximately $5.2 billion in 2025. They’ve committed over $80 billion in cloud infrastructure costs through 2029. They just raised $30 billion in a Series G at a $380 billion valuation — the second-largest private tech financing ever, behind only OpenAI.

They’re buying compute at a staggering scale: 1 million Google TPUv7 chips (~$52 billion deal), a dedicated 1,200-acre AWS data center campus in Indiana ($11 billion), and a $50 billion deal with Fluidstack for facilities in Texas and New York. Total committed compute: over 2 gigawatts.

All of this is funded by venture capital and strategic investors (Amazon’s $8B+, Google’s $3B+). Not by your $20/month Pro subscription.

Anthropic projects positive free cash flow by 2027-2028. That’s the plan. But plans require the revenue to actually materialize, the compute to come online in time, and the unit economics to hold as usage scales.

Right now, 60,000 subscribers are overwhelming the existing infrastructure so badly that paying customers can’t work.

The Subsidy Is Collapsing Under Its Own Success

Here’s the dynamic I didn’t fully appreciate three weeks ago.

The subsidy doesn’t end with a price increase. It ends with degradation.

Anthropic can’t raise prices on Pro from 20to20to50 tomorrow — that would cause a revolt and hand users to OpenAI and Google. But they can let the service get worse at the current price. Tighter rate limits. More frequent throttling. Peak-hour queuing. Features that work “sometimes.”

This is exactly what’s happening.

The math is simple. Double the subscribers on the same compute = everyone gets half the capacity. As one Reddit user put it: “selling more seats on the same plane and wondering why legroom is shrinking.”

And Anthropic isn’t alone. Google slashed Gemini API free tier quotas by 50-92% overnight in December 2025. One developer went from 300M+ input tokens per week to hitting limits at less than 9M. OpenAI’s ChatGPT Pro at $200/month is the only major offering that effectively removes caps — but at ten times the price of a Pro subscription.

The pattern across the industry: subsidized tiers are getting squeezed. The compute costs are real. And the bill always comes due.

Why I’m Not Hitting Limits (And You Might Not Be Either)

Here’s a mystery. Despite all this chaos, I’ve barely noticed the rate limits. After reading the threads and the leaked source code, I think I know why.

I almost never resume sessions. The biggest token drain fires on session resume. My workflow — fresh sessions, agent registration per session, structured CLAUDE.md — accidentally dodges this bug entirely.

Surgical prompts. I don’t say “explore my codebase.” I say “read this file and fix this function.” My beads-based task tracking means every session has a specific objective. No wandering. No 94k-token “Explore” runs.

Time zone arbitrage. IST puts my working hours outside US peak times. When r/ClaudeAI is screaming about rate limits at 2 PM Eastern, it’s midnight for me. I’m coding at 6 AM IST when San Francisco is asleep.

Structured context. Between CLAUDE.md, ARCHITECTURE.md, and explicit file paths, Claude doesn’t need to discover my codebase. It already knows the layout. That’s 90% less indexing work.

This isn’t luck. It’s workflow design. But it reinforces the point from my original post: the subsidy rewards those who use it efficiently. Wasteful usage — open-ended exploration, resumed conversations, vague prompts — burns tokens at 10-50x the rate of focused work.

What This Means For You

If you read my original post and thought you had 18-24 months — you might, on paper. Anthropic has the cash. They have the compute commitments. They project 70billioninrevenueby2028and70billioninrevenueby2028and17 billion in free cash flow.

But the experience of using the product is degrading right now. Not in 18 months. Now.

Here’s what actually matters:

1. Ship before the experience degrades further. The window isn’t about pricing — it’s about capability per dollar. Today, $20/month gets you frontier model access that would have cost $500/month in API calls two years ago. That ratio is moving in the wrong direction as more users pile in.

2. Optimize your workflow. Start fresh sessions. Use CLAUDE.md and ARCHITECTURE.md. Be specific in your prompts. Avoid “Explore” and open-ended commands. These aren’t just productivity tips — they’re rate limit survival strategies.

3. Don’t build on the assumption of unlimited AI access. If your product or workflow requires constant frontier model access at current prices, you’re building on borrowed time. Build systems that work with AI but can degrade gracefully. Ship products that generate revenue independent of your development tools.

4. The enterprise pivot is coming. Anthropic’s enterprise revenue is already 80% of total. They have 300,000+ business customers, with large accounts (>$100K ARR) growing 7x year-over-year. Follow the money: consumer subscriptions are the loss leader. Enterprise is the business. When push comes to shove, enterprise gets the compute.

The Real Lesson

The leaked source code is a metaphor for the entire AI subsidy era.

For weeks, users were burning through rate limits at impossible speeds. They blamed themselves (”skill issue”), they blamed Anthropic (”fix your limits”), they blamed the model (”Claude got dumber”). The actual cause was a two-line bug in a caching function that nobody could see because the code was proprietary.

That’s the subsidy in miniature. You’re using a product where you can’t see the internals, can’t predict the costs, and can’t control when the rules change. The value is extraordinary — right now. But you’re a guest in someone else’s infrastructure, running on someone else’s VC money, subject to someone else’s capacity planning.

The smartest move hasn’t changed since three weeks ago. Ship. Build durable assets — products, content, audiences, skills — while the arbitrage is still available.

But do it faster than you planned. The window isn’t closing in 18 months.

The glass is already cracking.

Zero to SaaS

Discussion about this post

Ready for more?