Claude Code Is Incredible. It Also Almost Shipped 8 Production Bombs Last Week.
What 15 years of production scars are still good for
What 15 years of production scars are still good for
Last week I caught eight bugs across three projects. Not typos. Not missing semicolons. Real, ship-breaking, production-melting problems that would have sailed right past code review and into the waiting arms of actual users.
Claude Code wrote the code. Claude Code passed the tests. Claude Code would have happily deployed it.
And Claude Code had no idea anything was wrong.
I’ve written about comprehension debt and argued that clean code is dead. But here’s the uncomfortable third act: even if you understand your specs perfectly and verify your outcomes ruthlessly, there’s a whole category of knowledge that AI simply doesn’t have.
Call it production intuition. Call it battle scars. Call it “I’ve been burned by this exact thing before.”
Whatever you call it, you can’t prompt your way into it.
The Localhost Delusion
Here’s what AI is optimized for: making code that works on your machine, right now, with your current data, under ideal conditions.
Here’s what AI is catastrophically bad at: imagining your code running on three replicas behind a load balancer at 3am when the database is under pressure and someone’s running a batch job that nobody documented.
A Fortune 100 developer on Reddit put it bluntly: “There’s a lot of vibe coded slop that works well for MVP but will absolutely fall apart under stress and within production environments once they scale to more users. It doesn’t really reveal itself until later when it’s much more difficult to fix.”
Later. When it’s difficult. The horror.
Let me walk you through what “later” looked like for me this week.
Pattern 1: Production Blindness
The Concurrency Landmine
I’m building a tool that routes MCP calls. Claude wrote it in Python — clean, well-structured, exactly what I asked for. Worked beautifully in testing. One request, one response, everybody’s happy.
Then I imagined what happens when three Claude Code sessions hit it simultaneously.
Oh.
Oh no.
Python’s threading model and the phrase “parallel requests” get along about as well as cats and bathtubs. Claude’s solution? More Python. Refactor this, optimize that. Very confident. Would’ve worked for lower concurrency.
But I knew this thing needed to handle dozens of parallel sessions. I prompted it to rewrite in Go. Claude nailed the port — goroutines, channels, the works. Problem solved in an afternoon.
The code was excellent. The language selection wasn’t. Claude optimizes brilliantly within the box you give it. It just won’t question whether you’re in the right box.
The Replicas Problem
Auth rate limiting. Claude implemented it in-memory with a clean sliding window algorithm. Textbook correct. Tests pass. Ship it.
One replica: perfect.
Two replicas: every user gets double the rate limit.
Three replicas: chaos.
This is distributed systems 101. The kind of thing you learn after you’ve been paged at 2am because someone figured out they could hit your API from three different IPs and get 3x the rate limit.
When I pointed this out, Claude immediately suggested Redis-backed rate limiting with proper distributed locking. Great solution. But it didn’t think about replicas until I did. Claude builds for the deployment model you describe. If you don’t describe it, localhost is the default.
The Sync Task Landmine
An API endpoint runs a long-running task. Claude implemented it synchronously — straightforward, easy to understand, does exactly what the tests verify.
Deploy it. First user clicks the button. 30-second timeout. 504 Gateway Timeout.
When I explained the problem, Claude refactored it to Celery with proper task queuing, retry logic, and status polling. Solid implementation. Took maybe 20 minutes.
But here’s the thing: I only caught it because I tested the actual user flow, not just the unit tests. Claude implemented what I asked for. I didn’t ask for “an endpoint that won’t timeout in production.” That’s on me. But it’s also the kind of thing I’ve learned to check after watching synchronous endpoints die approximately 47 times.
Pattern 2: Ecosystem Amnesia
The Deprecation You Won’t Find on Stack Overflow
Supabase deprecated their API keys. Not in a big announcement. Not in the docs you’d naturally read. In a GitHub discussion with 43 comments and a lot of confused developers.
I found out because I read the discussion. I then had to fix three separate projects.
Claude doesn’t read GitHub discussions. Claude doesn’t know what the community is grumbling about. Claude’s knowledge is frozen in time, and the ecosystem keeps moving.
The “Works But Wrong” Framework Choice
Claude encrypted secrets at rest using Fernet keys. Technically correct. Cryptographically sound. Tests pass. Secure enough.
But I’m using Supabase. Supabase has a vault feature built specifically for this. When I mentioned it, Claude migrated everything over cleanly — proper RLS policies, the works.
The Fernet implementation wasn’t wrong. It just wasn’t the right choice for this stack. Supabase vault means one less thing I manage, one less key rotation I handle, one less piece of infrastructure to think about.
Claude doesn’t know the zeitgeist of “what’s the idiomatic way to do this in Supabase.” That knowledge lives in community forums, Discord servers, and the muscle memory of people who’ve shipped Supabase apps before.
The Build System That Wasn’t
I made UI mockups with Tailwind CSS(using Claude, to be fair). Told Claude to use them. Claude happily served Tailwind from a CDN.
In development? Fine.
In production? Every page load fetches the entire Tailwind library. Uncompiled. Unoptimized. Approximately 300KB of CSS you don’t need.
Claude knows what Tailwind is. Claude doesn’t know that real projects compile it. That’s the kind of tribal knowledge you pick up by shipping things and watching your Lighthouse scores crater.
Pattern 3: Verification Vacuum
The Cache That Wasn’t
Database queries were getting slow. I asked Claude to add a caching layer. Claude wrote a beautiful Redis caching module — proper TTLs, cache invalidation on writes, the works. Tests for the module passed. I shipped it, watched the deployment go green, felt the warm glow of productivity.
The cache wasn’t being hit.
Claude built an excellent caching module. Claude did not check whether the actual query functions were calling the cache. The module worked perfectly. Nothing was using it. Every request still hit the database directly.
I discovered this by checking Redis after a few hundred requests. Empty. Revolutionary debugging technique, I know.
Could tests have caught this? Sure. Integration tests that verify “when I make this API call, Redis gets a cache entry.” But I’d need to know to write that test. Claude wrote unit tests for the caching module. I should have asked for end-to-end verification. The knowledge that “modules can exist without being properly wired up” is experience. Pattern recognition. The scar tissue from shipping features that weren’t actually features before.
Pattern 4: Architectural Judgment
The Multitenancy Time Bomb
A project using Qdrant vector database. Users store embeddings. Multiple users. Shared infrastructure.
The question that should wake you up at night: Can User A see User B’s data?
Claude’s implementation used collection-level isolation with proper tenant IDs in the filter queries. Reasonable approach. Worked in my tests.
But a thorough multitenancy review? The kind where you trace every query path, every edge case, every possible way data could leak between tenants? Where you think about what happens if someone forgets to pass the tenant filter? Where you consider whether the default behavior is secure-by-default or insecure-by-default?
That review took me two hours. Claude can help execute the fixes I identify, but it won’t spontaneously think “hey, multitenancy is a critical architectural decision that deserves paranoid scrutiny.”
Get multitenancy wrong and you’re on the front page of Hacker News, and not in the good way. Claude builds features. You build threat models.
What The Discourse Gets Wrong
Here’s what bothers me about the AI discourse: both sides are missing the point.
The AI skeptics say “AI code is garbage, don’t use it.” That’s wrong. Claude Code is incredibly useful. I ship faster. I handle complexity I couldn’t handle alone. The leverage is real.
The AI evangelists say “AI will replace developers, just describe what you want.” That’s also wrong. Describing what you want is the easy part. Knowing what you should want — that’s where the experience lives.
Someone on r/ExperiencedDevs nailed it: “Coding is the boring/easy part. Typing is just transcribing decisions into a machine. The real work is upstream: understanding what’s needed, resolving ambiguity, negotiating tradeoffs, and designing coherent systems.”
The developers who thrive aren’t the ones who write the most code. They’re the ones who catch the multitenancy bug before it ships. Who know that in-memory rate limiting won’t scale. Who’ve been burned by synchronous endpoints and CDN-served CSS and proxies that aren’t wired up.
Experience isn’t knowing the syntax. Experience is knowing the failure modes.
What’s Actually Worth Learning
So what’s worth learning when AI can write the code?
Production Thinking
This is the big one. Every example above comes back to it: Claude builds for localhost, you build for production.
Concretely, this means developing instincts for questions like:
“What happens when there are multiple replicas?” (Rate limiting, session state, caching, file storage — anything in-memory becomes a distributed systems problem)
“What happens under load?” (Synchronous operations become timeouts. N+1 queries become database meltdowns. That “fast enough” endpoint becomes a bottleneck)
“What happens when dependencies fail?” (Database is slow. External API is down. Redis is unreachable. Do you degrade gracefully or explode?)
“What happens at 3am when nobody’s watching?” (Background jobs. Retry logic. Dead letter queues. The things that fail silently)
How do you learn this? You can’t shortcut it. You deploy things. You watch them break. You read post-mortems. You get paged. You develop a paranoid imagination for failure modes.
But you can accelerate it: before you ship, spend 10 minutes imagining the deployment. Draw the boxes. How many instances? What’s in front of them? Where’s the state? What’s shared? This exercise catches 80% of the issues I described above.
Ecosystem Intuition
This is knowing the zeitgeist of your stack — not just what’s possible, but what’s idiomatic. What the community actually uses. What’s deprecated but still in the docs. What’s new but not proven.
Concretely:
Read the GitHub discussions, not just the docs. That’s where deprecations get announced, migration paths get debated, and footguns get documented.
Follow the maintainers on Twitter/X or Bluesky. They’ll tell you about breaking changes before the docs catch up.
Lurk in Discord servers. The “how should I do X” discussions reveal what’s considered best practice.
Actually ship with the stack. The difference between “I’ve read about Supabase” and “I’ve shipped three apps with Supabase” is enormous.
The goal: when Claude suggests an approach, you can immediately sense whether it’s the “right” way or just “a” way. Fernet encryption vs Supabase vault. CDN Tailwind vs compiled Tailwind. Redis rate limiting vs in-memory rate limiting. These aren’t in the documentation. They’re in the collective experience.
Architectural Paranoia
Some decisions are easy to change later. Some aren’t. Knowing the difference is half of senior engineering.
The ones that are hard to reverse:
Multitenancy model: Shared database with tenant IDs? Separate schemas? Separate databases? Choose wrong and you’re rewriting everything.
Auth architecture: Where do tokens live? How do sessions work? What’s the refresh flow? Changing this later breaks every client.
Data model fundamentals: Relational vs document. Normalized vs denormalized. Adding a column is easy. Restructuring your entire data model is not.
API contract design: Once clients depend on your response shape, changing it is a versioning nightmare.
For each of these, Claude will happily implement whatever you ask. It won’t stop and say “are you sure about this? This is hard to change later.” That paranoia is your job.
My rule: for any architectural decision I can’t easily reverse, I spend at least an hour thinking about alternatives before I let Claude write the first line.
Verification Instincts
What should you test for? What’s easy to get wrong? What looks done but isn’t actually wired up?
This is pattern recognition from past failures. The cache that wasn’t being hit. The feature flag that was never checked. The error handler that swallowed exceptions silently.
Concretely:
Test the user flow, not just the units. My caching module passed all its unit tests. The integration was broken. If I’d tested “make this API call and verify Redis has an entry,” I’d have caught it immediately.
Verify your assumptions. Claude wrote the code, but did the code actually get used? Add a log line. Check the network tab. Confirm reality matches intention.
Break it on purpose. What happens when you pass invalid input? What happens when the database is slow? What happens when the auth token is expired? Claude tests the happy path. You test the sad path.
The underlying skill: developing a checklist of “things that can look done but aren’t” for your specific domain. Every time you get burned, add it to the list. Eventually, you check these instinctively.
The Uncomfortable Conclusion
A freelance developer with 8 years of experience described a pattern he’s seeing across multiple clients: companies paying good money for internal software that barely works. Same symptoms every time. AI-generated comments. Algorithms that make no sense. Inconsistent patterns.
“Yes it mostly works,” he wrote, “but does so terribly to the point where it needs to be fixed.”
The era of AI slop cleanup has begun. And the people doing the cleanup are the ones who know what production actually looks like.
Claude builds for localhost. You build for production.
That gap is where your fifteen years live. And it’s not getting smaller.
This is the third essay in an accidental trilogy. First: comprehension debt is real. Second: clean code is dead. This one: the skills that matter more now, not less.

