I will never understand why anyone wants to go through all this. I don't believe for a second this is more productive than regular coding with a little help from the LLM.
I got access to Kiro from Amazon this week and they’re doing something similar. First a requirements document is written based on your prompt, then a design document and finally a task list.
At first I thought that was pretty compelling, since it includes more edge cases and examples that you otherwise miss.
In the end all that planning still results in a lot of pretty mediocre code that I ended up throwing away most of the time.
Maybe there is a learning curve and I need to tweak the requirements more tho.
For me personally, the most successful approach has been a fast iteration loop with small and focused problems. Being able to generate prototypes based on your actual code and exploring different solutions has been very productive. Interestingly, I kind of have a similar workflow where I use Copilot in ask mode for exploration, before switching to agent mode for implementation, sounds similar to Kiro, but somehow it’s more successful.
Anyways, trying to generate lots of code at once has almost always been a disaster and even the most detailed prompt doesn’t really help much. I’d love to see how the code and projects of people claiming to run more than 5 LLMs concurrently look like, because with the tools I’m using, that would be a mess pretty fast.
I doubt there's much you could do to make the output better. And I think that's what really bothers me. We are layering all this bullshit on to try and make these things more useful then they are, but it's like building a house on sand. The underlying tech is impressive for what it is, and has plenty of interesting use cases in specific areas, but it flat out isn't what these corporations want people to believe it is. And none of it justifies the massive expenditure of resources we've seen.
It’s not necessarily faster to do this for a single task. But it’s faster when you can do 2-3 tasks at the same time. Agentic coding increases throughout.
Until you reach the human bottle neck of having to context switch, verify all the work, presumably tell them to fix it, and then switch back to what you were doing or review something else.
I believe people are being honest when they say these things speed them up, because I'm sure it does seem that way to them. But reality doesn't line up with the perception.
True, if you are in a big company with lots of people, you won't benefit much from the improved throughput of agentic coding.
A greenfield startup however with agentic coding in it's DNA will be able to run loops around a big company with lots of human bottlenecks.
The question becomes, will greenfield startups, doing agentic coding from the ground up, replace big companies with these human bottlenecks like you describe?
What does a startup, built using agentic coding with proper engineering practices, look like when it becomes a big corporation & succeeds?
That's not my point at all. Doesn't matter where you work, if a developer is working in a code base with a bunch of agents, they are always going to be the bottleneck. All the agent threads have to merge back to the developer thread at some point. The more agent threads the more context switching that has to occur, the smaller and smaller the productivity improvement gets, until you eventually end up in the negative.
I can believe a single developer with one agent doing some small stuff and using some other LLM tools can get a modest productivity boost. But having 5 or 10 of these things doing shit all at once? No way. Any gains are offset by having to merge and quality check all that work.
I've always assumed it is because they can't do the regular coding themselves. If you compare spending months on trying to shake a coding agent into not exploding too much with spending years on learning to code, the effort makes more sense
I'm in the same boat. I'm 20 years into my SWE career, I can write all the things Claude Code writes for me now but it still makes me faster and deliver better quality features (Like accessibility features, transitions, nice to have bells and whistles) I may not had time or even thought of to do otherwise. And all that with documentation and tests.
You spend a few minutes generating a spec, then agents go off and do their coding, often lasting 10-30 minutes, including running and fixing lints, adding and running tests, ...
Then you come back and review.
But you had 10 of these running at the same time!
You become a manager of AI agents.
For many, this will be a shitty way to spend their time.... But it is very likely the future of this profession.
Anyway… watch the videos the OP has of the coding live streams. Thats the most interesting part of this post: actual real examples of people really using these tools in a way that is transferable and specifically detailed enough to copy and do yourself.
For each process, say you spend 3 minutes generating a spec. Presumably you also spend 5 minutes in PR and merging.
You can’t do 10 of these processes at once, because there’s 8 minutes of human administration which can’t be parallelised for every ~20min block of parallelisable work undertaken by Claude. You can have two, and intermittently three, parallel process at once under the regime described here.
The number you have running is irrelevant. Primarily because humans are absolutely terrible at multitasking and context switching. An endless number of studies have been done on this. Each context switch cost you a non-trivial amount of time. And yes, even in the same project, especially big ones, you will be context switching each time one of these finishes it's work.
That coupled with the fact that you have to meticulously review every single thing the AI does is going to obliterate any perceived gains you get from going through all the trouble to set this up. And on top of that it's going to be expensive as fuck quick on a non trivial code base.
And before someone says "well you don't have to be that thorough with reviews", in a professional settings absolutely you do. Every single AI policy in every single company out there makes the employee using the tool solely responsible for the output of the AI. Maybe you can speed run when you're fucking around on your own, but you would have to be a total moron to risk your job by not being thorough. And the more mission critical the software the more thorough you have to be.
At the end of the day a human with some degree of expertise is the bottleneck. And we are decades away from these things being able to replace a human.
How about a bug fixing use case? Let agents pick bugs from Jira and let it do some research and thinking, setting up data and environment for reproduction. Let it write a unit test manifesting the bug (making it failing test). Let it take a shot at implementing the fix. If it succeeds, let it make a PR.
This can all be done autonomously without user interaction. Now many bugs can be few lines of code and might be relatively easy to review. Some of these bug fixes may fail, may be wrong etc. but even if half of them were good, this is absolutely worth it. In my specific experience the success rate was around 70%, and the rest of the fixes were not all worthless but provided some more insight into the bug.
There is a chunk of devs using AI that do it not because they believe it makes them more productive in the present but because it might do so in the near future thanks to advances on AI tech/models, and then some do it because they think it might be required from them to do it this way by their bosses at some point in the future, so they can show preparedness and give the impression of being up to date with how the field evolves, even if at the end it turns out it doesn't speed up things that much.
That line of thinking makes no sense to me honestly.
We are years into this, and while the models have gotten better, the guard rails that have to be put on these things to keep the outputs even semi useful are crazy. Look into the system prompts for Claude sometime. And then we have to layer all these additional workflows on top... Despite the hype I don't see any way we get to this actually being a more productive way to work anytime soon.
And not only are we paying money for the privilege to work slower (in some cases people are shelling out for multiple services) but we're paying with our time. There is no way working this way doesn't degrade your fundamental skills, and (maybe) worse the understanding of how things actually work.
Although I suppose we can all take solice in the fact that our jobs aren't going anywhere soon. If this is what it takes to make these things work.
And most importantly, we're paying with our brain and skills degradation. Once all these services stop being subsidised there will be a massive amount of programmers who no longer can code.
I'm sorry to be blunt here, but the fact you're looking at idiotic use of Claude.md system prompts tells me you're not actually looking at the most productive users, and your opinion doesn't even cover 'where we are'.
I don't blame people who think this. I've stopped visiting Ai Subreddits because the average comment and post is just terrible, with some straight up delusional.
But broadly speaking - in my experience - either you have your documentation set up correctly and cleanly such that a new junior hire could come in and build or fix something in a few days without too many questions. Or you don't. That same distinction seems to cut between teams who get the most out of AI and those that insist everybody must be losing more time than it costs.
---
I suspect we could even flip it around: the cost it takes to get an AI functioning in your code base is a good proxy for technical debt.
The claim I was responding to you was that some people use our friends the magic robots not because they think they are useful now, but because they think they might be useful in the future.