A big problem was that he couldn't attain a mental model of how the code was beh...

A big problem was that he couldn't attain a mental model of how the code was behaving at runtime, in particular the lifetimes of data and objects - what would get created or destroyed when, exist at what time, happen in what sequence, exist for the whole runtime of the program vs. what's a temporary resource, that kind of thing.

The overall "flow" of the code didn't exist in his head, because he was basically taking small chunks of code in and out of ChatGPT, iterating locally wherever he was and the project just sort of growing organically that way. This is likely also what make the ChatGPT outputs themselves less useful over time: He wasn't aware of enough context to prompt the model with it, so it didn't have much to work with. There wasn't a lot of emerging intelligence a la provide what the client needs not what they think they need.

These days tools like aider end up prompting the model with a repo map etc. in the background transparently, but in 2023/24 that infra didn't exist yet and the context window of the models at the time was also much smaller.

In other words, the evolving nature of these tools might lead to different results today. On the other hand, if it had back then chances are he'd become even more reliant on them. The open question is whether there's a threshold there where it just stops mattering - if the results are always good, does it matter the human doesn't understand them? Naturally I find that prospect a bit frightening and creepy, but I assume some slice of the work will start looking like that.