I could have guessed you would say that :) but METR is not an unbiased study eit... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		aspenmartin 55 days ago \| parent \| context \| favorite \| on: 2025: The Year in LLMs I could have guessed you would say that :) but METR is not an unbiased study either. Maybe you mean that METR is less likely to intentionally inflate their numbers? If you insist or believe in a conspiracy I don’t think there’s really anything I or others will be able to say or show you that would assuage you, all I can say is I’ve seen the raw data. It’s a mess and again we’re stuck with proxies (which are bad since you start conflating the change in the proxy-latent relationship with the treatment effect). And it’s also hard and arguably irresponsible to run RCTs. All I will say is: there are flaws everywhere. METR results are far from conclusive. Totally understandable if there is a mismatch between perception and performance. But also consider: even if task takes the same or even slightly more time, one big advantage for me is that it substantially reduces cognitive load so I can work in parallel sessions on two completely different issues.

bopbopbop7 55 days ago [–]

I bet it does reduce your cognitive load, considering you, in your own words "Give up when Claude is hopelessly lost". No better way to reduce cognitive load.

aspenmartin 55 days ago | [–]

I give up using Claude when it gets hopelessly lost, and then my cognitive load increases.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact