Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They don't publish it as far as I can see!

In any case, IMHO I think AI SWE has happened in 3 phases:

Pre-Sonnet 3.7 (Feb 2025): Autocomplete worked.

Sonnet 3.7 to Codex 5.2/Opus 4.5 (Feb 2025-Nov 2025): Agentic coding started working, depending on your problem space, ambition and the model you chose

Post Opus 4.5 (Nov 2025): Agentic coding works in most circumstances

This study was published July 2025. For most of the study timeframe it isn't surprising to me that it was more trouble than it was worth.

But it's different now, so I'm not sure the conclusions are particularly relevant anymore.

As DHH pointed out: AI models are now good enough.





Sorry for the late response!

My guess is they didn't publish it because they only measured it at one company, if they had the data across the cohort they would have published.

The general result that review/re-wrok can cancel out the productivity gains is supported by other studies

AI generated code is 1.7x more buggy vs human generated code: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-gen...

Individual dev productivity gains are offset by peers having to review the verbose (and buggy) AI code: https://www.faros.ai/blog/ai-software-engineering

On agentic being the saviour for productivity, Meta measured a 6-12% productivity boost from agents programming: https://www.youtube.com/watch?v=1OzxYK2-qsI&si=ABTk-2RZM-leT...

"But it's different now" :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: