While I agree generally with the premise that the silver bullet that AI coding h...

While I agree generally with the premise that the silver bullet that AI coding has been marketed to be has underdelivered (even if it doesn't feel that way), I gotta point out that the experiment and its results don't do a good job of capturing that. One of the biggest parts of using these AI tools is knowing which tasks they're most suitable for (and sometimes it's using them in only certain subtasks of a task). As mentioned, some tasks they absolutely excel at. Flipping a coin and deciding to use it or not is crude and unrealistic. Hard to come up with a reliable method though, I also think METR has it's glaring issues.