AI Code Binge: The Next 50k Lines

I’ve continued to work on the project I discussed in the last post, and it’s now well into the “real project” range. At the moment it’s about 150k lines of code, after 574 commits, 298 PRs, 300 issues. Many of those last two have come recently as I’ve adapted my workflow, which is what I’ll discuss in this post.

I took a few days off to work on other things, and when I came back to the project I took stock and didn’t really feel like the simple chat->code->test loop was cutting it as the system got more complicated and capable. At 150+ pages the design and architecture docs were too unwieldy for both me and Claude to reason about with scattershot ideas and feedback. In a real team this would be where you want to spread the work out so people aren’t stepping on each other’s digital toes, but this is a different shape of that problem. I could fire off random bugs and ideas and have it build things, but you either do that in series, which is slow, or in parallel, which causes lots of merge conflicts and duplicated efforts.

Themed Versions

Side Note: Experienced devs and managers will start notice a theme here, that we already know how to manage this type of stuff, it’s the same things we’ve used to manage complex projects for decades. It’s just not totally obvious up front or even in the middle how we can apply that, and where things can be or should be different.

The first thing I did was to tag what I had as 0.1.0. Then I brain dumped everything I had queued up from big ideas to small ideas, and with Claude started to cluster these into themes. Then we mapped those themes to versions in a roadmap, with practical criteria that would define the goal of each version.

Then we drilled down into the next version, addressing specific details, refining the scope, etc. Ideas kept coming from all over, but if they don’t fit in this bucket, they simply get parked on the roadmap. With a tight scope, Claude breaks it down into tasks, which it can do very well. These tasks are pretty well specified, with background, acceptance/test criteria, related tasks, and medium level implementation details (e.g. table names, but not file names).

Review Time

Things are getting tricky enough now that I don’t want to do the “commit then review” approach, I want to try reviews prior to merging. I start handing the issues off to agents and having them send PRs. I also have them review each other’s work, and while they’re mostly writing good if not excellent code, it is finding enough things that I definitely validate that the reviews are worthwhile.

So now I’ve got some agents writing, some reviewing, some addressing feedback, and others trying to merge good PRs. I didn’t expect this to work, but I had to see where it was going to fail, which it quickly did. There were endless merge conflicts, agents deleting stacked branches, conflicting details and formatting. Gemini struggled with large conflicts, it would try to fix them, and would eventually succeed, but it took forever. Codex would realize what it just stepped into and often take a more surgical approach where it would directly re-apply the changes to a clean main. Claude did OK, but much slower than Codex. Claude tends to repeat obvious mistakes so I had to add some guardrails to it’s MEMORY.md, which I haven’t yet had to do with the others.

Oops: Formatting and Linting

I quickly realized I had never set up any strict linting and formatting. I implemented this, which invalidated a dozen or so pending PRs that were in merge hell, so I just closed those out. I did a full formatting/linting pass with pretty much every option turned on, then re-did those PRs. This would be utterly demoralizing for human coders, but the bots don’t care and it all took a couple of hours to get back on track.

More Merging > Less Merging?

The formatting helped with the conflicts but the agents were still butting heads frequently. I slogged through wrapping up that version, which took a couple of nights, and then took a new approach for the next one. This time I had the issues more strictly organized using GitHub subissues. I also had started to realize that Codex was consistently doing everything way faster than the others, and at similar or possibly even higher quality for the smaller scoped tasks. Also, Codex on the $20 plan seems to get more coding work done than the Claude $125 plan, which I frequently exhaust even using mostly Sonnet for coding tasks.

I told Codex to send me a PR for all open issues, of which there were about 48. It cranked for a while and finished the job. Then I had Claude review all of the PRs, leaving feedback as comments. Then Codex addressed all of the comments. For a human coder, this would be kind of a bonkers approach, but it worked well. Finally, I had Codex merge the PRs in batches, in whatever order it deemed appropriate. After each merge it would run the quicker tests, and after each batch it ran the full tests: unit, integration, E2E, which had already passed on push so they weren’t going to be far off. Halfway through I pulled and built the app myself and tested the new stuff, then it finished. Overall this approach handled more work in a lot less time. I don’t know how big this scales, but 48 issues is a pretty good sized chunk of work in terms of planning and effort on my part so I’m not sure I need to go too far beyond that.

Phases

I tried that version for two versions and it worked well but I made a tweak on the most recent one. This version lent itself well to phases so instead of a giant batch of 40 issues it became about 6 batches of 5-8 issues/PRs. The throughput is a bit lower, but this handled drift between phases better, so feedback on the second issue that affects the 30th issue doesn’t cause headaches because it’s already merged. I think the optimal batch size can vary here depending on the focus of the version, but it feels like the best approach I’ve tried so far.

Syncing Up

I forgot to do this with a couple of versions, but figured the design had probably drifted a bit from the implementation since some of the decisions were only the roadmap or the issues. I asked Claude chat to check a few key docs to confirm this, which it did. Then I asked Claude Code (Opus) to review everything in detail, it fired up a bunch of subagents, and … immediately ate my entire 5 hour Max quota in about 20 minutes of sifting through code. It eventually finished in the next window, and did a great job, but boy is it a monster for tokens for that type of work. I tried it again on the next version and it didn’t consume the whole quota so I think it’s something to do either in chunks, or frequently.

Random Bugs, No Backlog

If I come across an actual bug while I’m using it or testing it, I’ll describe it to my planning Claude session and it will file the bug on my behalf, unless it’s a symptom of a larger change, in which case it goes on the roadmap. I do not have a backlog of issues, if anything is an issue, it gets picked up and fixed. This is an anti-pattern for human developers, but it’s a lot easier to just tell the agents to fix all issues, rather than deal with tags and milestones and versions. When we plan the next version, these changes get incorporated there.

Overall this feels like a pretty sustainable and productive method. It’s not as exciting or tiring as the initial burst, but it is very elastic. I can spend 10 minutes and kick off a chunk of work, or I can spend 3 hours and keep things moving while designing the upcoming work. This project is far from done so we’ll see how we do with a progessively larger and more complex environment.

AI Code Binge, February 2026 Style

I recently took a week off work to recharge and ended up going on a bit of a binge planning and building out a new system. It gave me a chance to explore some non-Gemini/non-Google stuff with more energy than I’m normally able to in my spare time, and I figured I’d share some thoughts:

Models

No real surprises here, but Opus 4.6 is really fantastic at planning and reviewing. Sonnet 4.6 doesn’t seem to produce any worse code than Opus, but it does make more mistakes when it comes to decisions. Codex 5.3 is by far the fastest, and also the most focused and direct. Gemini is faster than Claude, tends to take a more meandering/thorough route than Codex. Opus feels like the best partner of the batch in terms of design, but they all have useful aspects. Where I’ve settled is to iterate in Opus and periodically run it by ChatGPT and Gemini for feedback, which has been fruitful. I’ve done most of the early coding work with Claude because Claude Code is just a little ahead of the other CLI tools, but the models are all good enough for pretty much anything.

Workflow

This was a greenfield project and is now about 100k lines so it went through a few phases pretty quickly over ~30 hours. I spent a lot of time in a chat session just planning it out before building anything, so I started the build with a 60+ page design doc and a similarly sized architecture doc that I’d iterated on over probably ~10 hours. Claude came up with a pretty good phased approach, so I had it go through this for a few steps. The first couple of phases I kept a tight leash but after a while I went to YOLO, once I’d had enough patterns established.

As I iterated, I would use the Claude chat, which was now managing the docs in GitHub in a branch. This made it much easier to review via PRs that resulted from the decisions to make sure there weren’t any side effects or lossiness. The chat will create GitHub issues based on changes. Then I go to claude code/codex/gemini and tell it to fix a specific issue or just fix them all. Claude takes 10-20 minutes to handle most things, up to 40 for bigger batches or bigger changes. Sometimes it does them in parallel, sometimes not, I don’t think it’s really dialed in yet on where to split things up, but it errs on the side of serial so it almost never conflicts with itself.

Code

I don’t review the code closely but I do read it and it all looks really good. There aren’t many examples of the issues we’ve come to expect from these things. No significant cases of overcommenting, creating multiple version of the same thing, naively structured files/classes. I think this is a combination of:

  1. The models getting better
  2. Starting from scratch, no legacy decisions to consider or tech debt or “this is how we used to do it”.
  3. Having a thorough (though not formal in any sense) design and architecture spec with derived artifacts like roadmaps. Major changes are tracked in ADRs so it’s only tried to undo that once.

Context window and compacting are challenges at this point for design, less so for coding, as the fairly rigorous design approach yields tighter iteration loops, scopes, and smaller blast radii for changes.

Biology

I’m not tooting my own horn here, as this is much more “this is what these things can do if you let them”, but what I’ve built in a week, both in terms of capabilities and polish and raw metrics (200+ pages of design/docs/tutorials, 100k lines of code, 1k+ tests, dozens of E2E tests) is way beyond 10x. I’m a fairly prolific coder when possible and a good big picture thinker but in line with this has been exhilarating and exhausting in a novel way. It’s less like a creative Flow state where time slips away and more like a good video game. “Just one more feature” feels alot like “just one more quest”. I don’t think I could keep this up indefinitely, or it would at least take a while to adapt. A typical session looks like this:

  1. Run through the app, trying previous/new things, typing notes into the design chat.
  2. Iterate a bit there, it updates docs, creates issues.
  3. Have the agent work on the issues.
  4. Repeat, doing step 1 while the previous iteration of step 3 is happening.

The step change is that this is a ~30 minute cycle, not a 2-3 week sprint, and these can be pretty significant or deep changes. It’s literally building things faster than you can design and try them (not even including the self improvement loop). And it’s doing them well, this isn’t a simple project and it’s not making garbage code. It’s novel because it’s more productive than Flow but also less comfortable. I’ve only been spending like 3-4 hours a day on it and my brain and dopamine circuits still haven’t really figured out to react to it yet, so you end up in a contradictory state of doing smart things with your lizard brain. That said, it’s been really fun and I recommend trying it if you can!

How to Clean a Room

Cleaning a room is one of those things that almost everyone learns, but we all learn differently, and we all do differently. You would think that by the time we’re adults, we’ve done it enough that there’s barely a “how” to even think about. However, sometimes I, like this past weekend, stand in a messy room and think of 10 other things I’d rather do. This leads to excuses like “I don’t even know where to start” which then sound more plausible and, as the mess’s existence proves, this argument wins and I’m off to do one of those 10 things.

This time, being me, my mind, possibly inspired by one of Boston Dynamics’ recent videos, wanders to think how I’d program a robot to do it. So I come up with a program.

This program has a couple overarching rules. First, you can’t go backwards. When you finish cleaning one part you can’t put anything else there. You can stash forwards but not backwards. Second, you can’t go backwards into another room either, no putting stuff on the floor of the hallway. Last, anything you touch that doesn’t belong in the room can’t stay in the room.

  1. Floor. Get everything off it. Doesn’t matter where you put it, just follow the constraints. We do this first because now we can walk around and should be able to reach pretty much everything.
  2. Seating. Clear it. As you progress, things you need to deal with tend to get smaller, so you might want to sit down and sort them out. Or just take a break.
  3. Beds. Clear them. It’s tempting to pile stuff here, but you don’t want to sleep on the floor, and the steps after this aren’t going to be very comfortable either.
  4. Surfaces. Desks, benches, counters. This is harder because some stuff actually lives here—a lamp, a project, whatever. Sort what belongs from what just landed there.
  5. Shelves. Open cabinets too. The pace slows but victory is close. Organize what stays, remove what doesn’t. After this step the room will look clean.
  6. Drawers. Closets, cabinets with doors. Deep clean territory. Sort through everything: what belongs, what doesn’t, what’s just been hiding. We’re getting nitpicky and the distractions are making a stronger case for your time, but it’s going to be worth it. Your future self will thank you.

OK, but wait, there are two shelves, which one first? At each step, if there are choices to be made, you work around the room counterclockwise from the main entry point, bottom to top. Why? Because I said so. The whole point of this program is to be deterministic and remove those decisions that allow other things to creep in. Remember, you’re a robot.

So you just read a blog post by an (alleged) adult, telling other adults in excessive and deterministic detail how to do something we’ve all been doing since childhood. If you’re an engineer like me you might feel seen. If you’re not an engineer you might be looking like the Nick Cannon meme and are thinking “who thinks like this?” Fair. I think about that gap a lot, how some minds work so differently that it’s hard to even understand, much less empathize, and that’s what originally seeded this post, but this bit emerged. More on that another time.

Have you ever “programmed” a task we’re just supposed to know how to do?

AI, Art & Mortgages

I want to start by acknowledging that this is a topic that directly affects people’s livelihoods. Real people are losing real work to generative AI right now, and that matters. I’m not going to pretend this is purely an abstract or anonymous philosophical debate. Also I have enjoyed every Sanderson book I’ve read and have no beef with him, he’s simply a target of his own making here by communicating clearly.

That said, I’ve been struggling with this topic because I can’t find a clean position. The conversation around AI and art tends toward extremes: either it’s theft and should be banned, or it’s a tool like any other and everyone should embrace it. I’m not comfortable on either end. There are too many layers and angles, and I think flattening them into a simple take does a disservice to everyone involved.

The clearest version of the anti-AI argument I’ve encountered comes from Brandon Sanderson. His thesis, roughly: the struggle is the art. The book you write isn’t really the product, it’s a “receipt” proving you did the work. You become an artist by writing bad books until you write good ones. The process of creation changes you, and that transformation is the actual art. LLMs can’t grow, can’t struggle, can’t be changed by what they make. So they can’t make art.

It’s a thoughtful position. But I think it’s also circular. He’s defined art as the process of struggle, but the audience doesn’t experience your struggle. They experience the output. Nobody listening to an album knows or cares whether it took a week or three years to record it. They care if it moves them. When I read Mistborn (which I enjoyed!), I’m not feeling Sanderson’s growth journey from White Sand Prime through six unpublished novels that I never read. I’m feeling the story he eventually learned to tell.

“Put in the work” is real advice and I believe in it deeply. But the work is how you get good, not why the result matters to anyone else. Those are different things. Conflating them feels like asking the audience to subsidize your growth journey.

Subsidy

And maybe that’s what some of the anger is actually about. AI threatens the subsidy.

The middle tier of creative work: background music, stock photography, commercial illustration, session gigs was never really about profound artistic growth. It was a way to pay the mortgage while developing your craft on nights and weekends. You do the pedestrian work that keeps the lights on, and that buys you time to make the art you actually care about. AI competes in that middle tier directly, and it’s winning.

That’s a real economic disruption, and I don’t want to minimize it. But framing it as “AI can’t make art because it doesn’t struggle” is a philosophical dodge of an economic problem.

That model isn’t ancient. It’s maybe 50-80 years old. The session musician, the stock photographer, the commercial illustrator working on their novel at night, these are 20th century inventions. Before that, you had patrons, or you were wealthy, or you just didn’t make art professionally. The “starving artist” is a well-known trope, but the “starving artist who does commercial work to fund their real art” is a much more recent arrangement. But there were also far fewer artists, with a lot more gatekeeping, so I’m not arguing that everything was great before then either.

“I did it myself”

There’s also the provenance argument, that AI is trained on copyrighted work without consent or compensation. And that’s a real concern. But virtually all musicians learned to play and write by listening to and studying other musicians. There’s no system to track that provenance or pay royalties unless it’s a nearly-direct copy. The line between “learned from” and “trained on” is blurrier than it feels.

That said, I don’t want to dismiss the emotional weight here. Feeding your art and creativity into a machine with no credit—while some corporation profits from it—is a tough hit to the ego, not just the bank account. That’s a legitimately hard thing to get past, and I hope we find a better solution for it. The current arrangement feels extractive in ways that don’t sit right, even if I can’t articulate exactly where the line should be.

Sanderson said “I did it myself” referencing his first novel that he hand-wrote on paper. This feels cringeworthy to me, because in no way is he doing it himself. That first novel had thousands of contributors, from his parents and teachers to stories he read, conversations he had about it, movies he watched and so on.

This connects to something my thoughts keep coming back to: we’re always in the middle. Most people like to think of their place in a creative effort as the beginning or the end; the origin of something new, or the final word on something complete. But nobody starts from zero. The most original ideas are still cued by experiences. The most original inventions are still spurred by problems. Your inputs came from somewhere.

And it goes the other direction too. If we write the book, people still need to read it. If we compose the song, someone still needs to hear it. Our outputs are someone else’s inputs, often without permission, credit, or compensation. The chain keeps going.

Sanderson’s framing puts the artist at the center as the origin point of authentic creation, forged through struggle. But if we’re all in the middle, if every artist is just transforming their inputs into outputs that become someone else’s inputs, then the question of whether the transformer “struggled” feels less central. The chain of influence extends in both directions, through every artist who ever lived, and will continue through whatever comes next.

Starving Engineers

And then there’s the scope problem. Generated music is bad but generated code is fine? Generated paintings are theft but generated infographics are helpful? The reactions seem to track with how much cultural romance we attach to the craft. Software engineering has no “starving engineer” mythology. Nobody thinks I suffered for my art when I debugged a race condition. So when AI writes code, it’s a tool. When it writes songs, it’s an existential threat.

Photography is worth remembering here. In the 1800s, critics argued photography wasn’t art because it merely captured what already existed. Some said copyright should go to the subject, or even to God, not the photographer. It was too easy, just thoughtlessly press a button.

But over time, people figured out that taking a photo wasn’t a mundane task. Good photographers could be in the same place with the same equipment and consistently create images that moved people. The tool became a medium. Mastery emerged.

I think AI will follow a similar path. Right now most people are still tinkering, having mixed results. But we’re starting to see glimpses of people getting genuinely good at it, comfortable enough that they can do things most people can’t, or never thought of. They’ll convey ideas and emotions in new ways. They’ll be drawing on the collective contributions of thousands of generations of prior artists, just like every artist always has.

I don’t have a clean conclusion here, and I’m not sure anyone should right now. The displacement is real. The ethical questions around training data are real. The cultural anxiety about what counts as “real” art is real. I can’t join the strong positions on either side, because I think we’re very early in a journey that will outlive all of us.

What I am is cautiously optimistic. The history of art is full of new tools that were rejected as cheating until people learned to master them. The history of technology is full of painful transitions that looked like apocalypses at the time and turned out to be recalibrations. I suspect this is one of those. I hope so, anyway. We won’t know for a while yet.

Building at the speed of … builds

I’ve been thinking about build speed lately, usually while waiting for builds, and I think the thing that’s underappreciated isn’t the raw numbers, it’s that different speeds are qualitatively different experiences. Faster is always better, but it’s far from a linear relationship.

Working on a package that builds in 100ms is basically invisible. You don’t even notice it’s happening. The feedback loop is so tight that it feels like the code is just doing what you told it to do. You’re in conversation with the machine and you are the bottleneck, which is the goal.

At 10 seconds, it’s disruptive, but if the tooling is set up well you can stay in flow. You wait. You’re still there when it finishes. You might even find a bit of rhythm or cadence here and get a little thrill from the anticipation like hitting a long fly ball and seeing if it makes it out.

At a minute, it’s more like someone tapping you on the shoulder to ask a question. Your attention wobbles. You notice you could use a coffee, or you tab over to email to check something “real quick.” Five minutes later you come back and the build failed two minutes ago. Now you’re reloading context.

At 10 minutes, it changes your whole relationship with the work. You start actively avoiding triggering builds. You’re trying to see how far you can get while holding your breath. If it fails at 9:30 you’re genuinely frustrated, and maybe you’ll just go find something else to do for a while.

The reason I think this matters is that people tend to look at build optimization as a spreadsheet exercise: spend 8 hours to save 30 seconds, amortize across however many builds, calculate break-even. Even if the math works out it feels tedious and while the other coders might thank you for a 5% reduction the suits won’t.

I think that exercise misses the point entirely. The less quantifiable stuff pays back almost immediately. You’re more focused. You’re doing better work. You’re just happier. A developer who’s been trained by their feedback loop to flinch isn’t going to produce the same work as one who can iterate freely.

But AI

There’s an argument to me made that AI changes this calculus, that it doesn’t matter anymore because the AI is doing the building in the background and will let you know when it’s done. But I think it actually makes build speed more important, not less.

Since the flow state and focus don’t matter as much with async coding, now the math is actually meaningful and the small wins will compound even further. If you’re coding at 1x speed and building every 10 minutes, and the build takes 2 minutes, you’re spending about 20% of your time waiting on builds. Annoying, but manageable.

Now imagine an AI coding at 10x. It wants to build every minute to verify its work. But the build still takes 2 minutes. Suddenly 66% of the time is build. The AI isn’t going to get frustrated and check its email, but it’s also not doing useful work during that time. And if you’ve got multiple agents running in parallel, that bottleneck adds up and leaves even more open loops to manage.

When you speed up one part of a pipeline, the bottleneck shifts somewhere else. AI sped up the coding. Now the build is often the bottleneck. If anything, that’s an argument for investing more in build speed than we did before, the returns are even higher when you’re trying to iterate faster.

The Middle

I think most people like to think of their place in a creative effort as the beginning or the end or even both, but the reality is that we’re always in the middle.

The most original ideas are still cued by experiences. The most original inventions are still spurred by problems. Nobody starts from zero, and I don’t just mean privilege or connections (though those count too). I mean the basic fact that your inputs came from somewhere, just like you did.

And it goes the other direction too. If we ship the software, people still need to use it. If we build the house, someone still needs to live in it. Our outputs are someone else’s inputs. The chain keeps going.

Once you accept this, something shifts. You don’t need to credit every influence or take responsibility for everything that happens downstream, but being aware that they exist opens your eyes. You start to see how your work can go places you didn’t expect, inform decisions you weren’t part of, generate ideas you won’t be around for. And it lets you rewind. If your “great idea” doesn’t work, it was always just one link in a chain, and you can go back and try a different path.

I think this actually reinforces your contribution rather than reducing it. We tend to put new ideas and great results on a pedestal and treat everything in between as an unavoidable burden. But if it’s all in between, if there is no pristine beginning or triumphant end, just the middle, then you have permission to appreciate and invest in the whole process. This may not get you on the front page or in the corner office, but I think it’s a clearer path to fulfillment and happiness

Look Up First

If you put an architect or designer in an unfamiliar space and take a blindfold off, the first thing they’re likely to do is look up. They’re looking for load-bearing walls. For structure. For constraints.

They might not even be able to tell at a glance. But it’s so important that it’s worth a try. Why? Because it instantly reduces the number of possibilities from near-infinite to something tangible. The pattern recognition finds some surface area to grab onto. Ideas start to get bounced at the door, and the ones that don’t get bounced find space to flourish.

Point an experienced engineer at a codebase and they’re doing the same thing. What frameworks are you using, and which parts are you actually using? What are people depending on? What are your interfaces, your standards?

Not what version of the language it’s written in. Not tabs versus spaces. Those matter eventually, but not on day one. On day one, you need to know the shape of the thing, not the texture.

You don’t need the full blueprint in your head, but you need to know the important parts. You need to know which metaphorical walls you can drill into or knock down and which carry the roof or have plumbing and will blow your budget if you open them up.

One powerful way to do this: look at the abstractions the system makes. You can see which ones held up, which needed workarounds or patching. Which added value. What customers are relying on. What you can and can’t move or remove. The abstractions that survived are load-bearing now—whether they were good bets or just bets that got stuck.

If you can’t point to which code is load-bearing in your system, you don’t understand your system—even if your goal is to tear it down.

52 Word Review: One Battle After Another

One Battle After Another was well-written, well-paced, well-cast, well-acted and well-shot, and yet felt completely forgettable and somehow unoriginal.  There were echoes of Tarantino, Terminator 2, Easy Rider and many others, but that’s all it felt like.  This might be an accomplishment for other directors but for Anderson it was a disappointment.