Post-Agentic Code Forges

I think you got a lot right here. Setting aside the human/AI debate, there is a clear demand for quickened pace of collaboration. I'd argue that demand for that has existed since before agents. Git already was a very coarse system which highly discouraged two humans from collaborating on changes since only one would get credit for their work. This leads to the stupid "joysticking" pattern of PR review, where the reviewer knows exactly what they want to PR submitter to do but they're still going to force the PR submitter to make exactly the set of tiny tweaks the reviewer wants. I hate when people do this to me. Being a robot for someone else is not a valuable use of my time. But it's git that forces people into this corner by not having a fine-grained-enough way to represent the true nature of the collaborative effort for posterity.

If the AI explosion has proved anything it's that people are now determined to move faster, to cut through the layers of red tape and ship. For myself, I know that doing this has little to do with whether or not you're using AI. If you take tiny shuffling steps, maximally covering your ass at every stage, then you'll go almost nowhere. For example you might create overhead by demanding beautiful git history with bug fixes separated from improvements separated from feature work so that in practice most small bugfixes are permanently backlogged. You might have engineers waiting days for review to be able to land code, or you might have AIs generating 10,000 lines of new tests with each feature PR. If you need to walk 1 mile, taking 4-inch steps will get you there, slowly. If you need to walk 1000 miles, 4 inch steps are clearly ridiculous.

Funny enough: I'm rapidly converging on the language of machine learning: gradient descent. The real trick is to size your steps roughly proportional to the distance you have left to travel. If you know you have a long ways to go, it's time to throw the rulebook out the window and let your humans ship ship ship ship! Ship experiments, ship broken stuff, throw things at the wall and see what sticks. Make progress in flying leaps.

What makes me so sad about the AI era is that so few people are willing to move in leaps. Even companies that (in theory) want to create radically new things have just taken existing things (like VSCode or Github) and treat these decrepit decade-old designs as 99% perfect gospel. The result is that we're just shuffling forward with the same old junk, except that now employers want human AI hybrids which shuffle-step at 60hz. Even the fast-shuffling zombies can't catch a person who is allowed to stretch their legs out and *run* though.

This post misses two very important benefits of the code review:

* sharing knowledge

* ensuring maintainability

I think those points are covered inside (2), which is essentially the bus factor. Im arguing that those problems are getting cheaper as agents getting faster and better.

Three issues with agents getting better:

* sharing knowledge: people learn, agents don't

* ensuring maintainability: LLMs are very bad at being consistent, due to inherent randomness, and due to the fact that they do not learn

* context size: I'm afraid that we soon hit a barrier with increasing context size to make agent understand larger code base - and LLMs do not form models of behavior like we do

If you don't understand the code that agent produced, how can you be sure that it does what it meant to do?

With agents you have replaced the problem of writing code in precise programming language with describing problem in imprecise natural language, and the problem of writing code with the problem of reviewing code.

Personally, I believe those 3 issues are solvable and are getting solved. I have seen much progress on (1) and (3) for the last 2 years. I suspect (2), the LLM's reliability and accuracy can be improved by several techniques that are generally grouped under "inference compute" or "using tools" umbrella. In my experience, I was able to get very good results on problems that senior engineers struggle with in large+old code base by simply giving the LLMs the ability to run tests to validate that it works.

With that said, I do very much agree with you that we are not "there" yet today. And human-in-the-loop engineers are still very much needed.

Son’s Substack

Post-Agentic Code Forges