Studies have shown that software engineers are very bad at judging their own productivity. When a software engineer feels more productive the inverse is just as likely to be true. Thats why anecdotal data can't be trusted.
Please provide links to the studies, I am genuinely curious. I have been looking for data but most studies I find showing an uplift are just looking at LOC or PRs, which of course is nonsense.
Meta measured a 6-12% uplift in productivity from adopting agentic coding. Thats paltry. A Stanford case study found that after accounting for buggy code that needed to be re-worked there may be no productivity uplift.
I haven't seen any study showing a genuine uplift after accounting for properly reviewing and fixing the AI generated code.
>Meta measured a 6-12% uplift in productivity from adopting agentic coding. Thats paltry.
That feels like the right ballpark. I would have estimated 10-20%. But I'd say that's not paltry at all. If it's a 10% boost, it's worth paying for. Not transformative, but worthwhile.
I compare it to moving from a single monitor to a multi-monitor setup, or getting a dev their preferred IDE.
> ... just looking at LOC or PRs, which of course is nonsense.
That's basically a variation of "How can they prove anything when we don't even know how to measure developer productivity?" ;-)
And the answer is the same: robust statistical methods! For instance, amongst other things they compare the same developers over time doing regular day-job tasks with the same quality control processes (review etc.) in place, before and after being allowed to use AI. It's like an A/B test. Spreading across a large N and time duration accounts for a lot of the day-to-day variation.
Note that they do not claim to measure individual or team productivity, but they do find a large, statistically significant difference in the data. Worth reading the methodologies to assuage any doubts.
> A Stanford case study found that after accounting for buggy code that needed to be re-worked there may be no productivity uplift.
I'm not sure if we're talking about the same Stanford study, the one in the link above (100K engineers across 600+ companies) does account for "code churn" (ostensibly fixing AI bugs) and still find an overall productivity boost in the 5 - 30% range. This depends a LOT on the use-case (e.g. complex tasks on legacy COBOL codebases actually see negative impact.)
In any case, most of these studies seem to agree on a 15 - 30% boost.
Note these are mostly from the ~2024 timeframe using the models from then without today's agentic coding harness. I would bet the number is much higher these days. More recent reports from sources like DX find upto a 60% increase in throughput, though I haven't looked closely at this and have some doubts.
> Meta measured a 6-12% uplift in productivity from adopting agentic coding. Thats paltry.
Even assuming a lower-end of 6% lift, at Meta SWE salaries that is a LOT of savings.
However, I haven't come across anything from Meta yet, could you link a source?
I guess it all comes down to what a meaningful gain is. I agree that 10-30% is meaningful and if “software is a gas” this will lead to more software. But my expectations had become anchored to the frontier labs marketing (10x), and in that context the data was telling me that LLMs are a good productivity tool rather than a disruptor of human labor.
Yeah unfortunately the hype is overwhelming and it needs real work to figure out what the real impact is. At this point the gains are modest but still compelling.
On the other hand, we are still going through a period of discovering how to effectively use AI in all kinds of work, so the long-term impact is hard to extrapolate at this point. Fully AI-native workflows may look every different from what we are used to.
Looking at something like the Beads and Gas Town repos, which are apparently fully vibe-coded, is instructive because the workflow is very different... but the volume of (apparently very useful) code produced there by mostly one dude with Claude is insane.
As such, I can also see how this can become a significant disruptor of human labor. As the parent of a teen who's into software engineering, I am actually a bit concerned for his immediate future.
I don’t work in SWE so I am just reacting to the claims that LLMs 10x productivity and are leading to mass layoff in the industry. In that context the 6-12% productivity gain at a company “all in” on AI didn’t seem impressive. LMMs can be amazing tools, but I still don’t think these studies back up the claims being made by frontier labs.
And I think the 6-12% measure reports is from a 2025 not 2024 study?
I think an OpenAI paper showed 25% of GPT usage is “seeking information”. In that case Google also has a an advantage from being the default search provider on iOS and Android. I do find myself using the address bar in a browser like a chat box.
The productivity studies on software engineers directly don't show much of a productivity gain certainly nowhere near the 10x the frontier labs would like to claim.
When including re-work of bugs in the AI generated code some studies find that AI has no positive impact on software developer productivity, and can even have a negative impact.
The main problem with these studies are they are backward looking, so frontier labs can always claim the next model will be the one that delivers the promised productivity gains and displace human workers.
> The productivity studies on software engineers directly don't show much of a productivity gain certainly nowhere near the 10x the frontier labs would like to claim.
Which studies are you talking about? The last major study that I saw (that gained a lot of attention) was published half a year ago, and the study itself was conducted on developers using AI tools in 2024.
The technology has improved so rapidly that this study is now close-to-meaningless.
"The technology has improved so rapidly that this study is now close-to-meaningless."
You could have said that anytime in the last 3 years, but the data has never shown it to be true. Is there data to show that the current gen models are so much better than the last gen models that the existing productivity data should be ignored? I don't think the coding benchmarks show a step change in capabilities, its generally dev vibes rather than a large change to measurements.
> We’ll unpack why identical tools deliver ~0% lift in some orgs and 25%+ in others.
At https://youtu.be/JvosMkuNxF8?t=145 he says the median is 10% more productivity, and looking at the chart we can see a 19% increase for the top teams (from July 2025).
The paper this is based on doesn't seem to be available which is frustrating though!
I think you are quoting productivity measured before checking the code actually works and correcting it. After re-work productivity drops to 1%. Tinestamp 14:04.
In any case, IMHO I think AI SWE has happened in 3 phases:
Pre-Sonnet 3.7 (Feb 2025): Autocomplete worked.
Sonnet 3.7 to Codex 5.2/Opus 4.5 (Feb 2025-Nov 2025): Agentic coding started working, depending on your problem space, ambition and the model you chose
Post Opus 4.5 (Nov 2025): Agentic coding works in most circumstances
This study was published July 2025. For most of the study timeframe it isn't surprising to me that it was more trouble than it was worth.
But it's different now, so I'm not sure the conclusions are particularly relevant anymore.
As DHH pointed out: AI models are now good enough.
Isn’t this what Tao is addressing in the link, that LLMs haven’t encoded reasoning? Success in IMO is misleading because they are synthetic problems with known solutions that are subject to contamination (answers to similar questions are available in the textbooks and online).
He also discusses his view on the similarity and differences between mathematics and natural language.Tao says mathematics is driven entirely by efficiency, so presumably using natural language to do mathematics is a step backwards.
I think people make comments on LLMs not being smart in reaction to the comments from the leaders of AI labs that LLMs are so smart they could/will lead to mass unemployment.
Are SWE’s really experiencing a productivity uplift? When studies attempt to measure the productivity impact of AI in software the results I have seen are underwhelming compared to the frontier labs marketing.
And, again, this is ignoring all the technical debt of produced code that is poorly understood, weakly-reviewed, and of questionable quality overall.
I still think this all has serious potential for net benefit, and does now in certain cases. But we need to be clearer about spelling out where that is (webshit, boilerplate, language-to-language translation, etc) and where it maybe isn't (research code, legacy code, large codebases, niche/expert domains).
This Stanford study on developer productivity found 0 correlation between developers assessment of their own productivity and independent measures of their productivity. Any anecdotal evidence from developers on how AI has made them more or less productive is worthless.
reply