The design of that study is pretty bad, and as a result it doesn't end up actual...

singron · 2025-09-25T19:28:55 1758828535

I don't think there is anything factually wrong with this criticism, but it largely rehashes caveats that are already well explored in the original paper, which goes through unusual lengths to clearly explain many ways the study is flawed.

The study gets so much attention since it's one of the few studies on the topic with this level of rigor on real-world scenarios, and it explains why previous studies or anecdotes may have claimed perceived increases in productivity even if there wasn't any actual increases. It clearly sets a standard that we can't just ask people if they felt more productive (or they need to feel massively more productive to clearly overcome this bias).

logicprog · 2025-09-26T17:20:09 1758907209

> it largely rehashes caveats that are already well explored in the original paper, which goes through unusual lengths to clearly explain many ways the study is flawed. ... The study gets so much attention since it's one of the few studies on the topic with this level of rigor on real-world scenarios,

Yes, but most people don't seem aware of those caveats, and this is a good summary of them, and I think it does undercut the "level of rigour" of the study. Additionally, some of what the article points out is not explicitly acknowledged and connected by the study itself.

For instance, if you actually split up the tasks by type, some tasks show a speed up and some show a slowdown, and the qualitative comments by developers about where they thought AI was good/bad aligned very well with which saw what results.

Or (iirc) the fact that the task timing was per task, but developer's post hoc assessments were a prediction of how much they thought they were sped up on average across all tasks, meaning it's not really comparing the same things when comparing how developers felt vs how things actually went.

Or the fact that developers were actually no less accurate in predicting times to task completion overall wrt to AI vs non-AI.

> and it explains why previous studies or anecdotes may have claimed perceived increases in productivity even if there wasn't any actual increases.

Framing it that way assumes as an already established fact that needs to be explained that AI does not provide more productivity Which actually demonstrates, inadvertently, why the study is so popular! People want it to be true, so even if the study is so chock full of caveats that it can't really prove that fact let alone explain it, people appeal to it anyway.

> It clearly sets a standard that we can't just ask people if they felt more productive

Like we do for literally every other technological tool we use in software?

> (or they need to feel massively more productive to clearly overcome this bias).

All of this assumes a definition of productivity that's based on time per work unit done, instead of perhaps the amount of effort required to get a unit of work done, or the extra time for testing, documentation, shoring up edge cases, polishing features, that better tools allow. Or the ability to overcome dread and procrastination that comes from dealing with rote, boilerplate tasks. AI makes me so much more productive that friends and my wife have commented on it explicitly without needing to be prompted, for a lot of reasons.