That's because they are as close to "object measure capabilities" as anything we...

achierius · 2025-11-09T06:10:58 1762668658

That's assuming these benchmarks are the best we're ever going to get, which they clearly aren't. There's a lot to improve even without radical changes to how things are done.

ACCount37 · 2025-11-09T17:11:09 1762708269

The assumption I make is that "better benchmarks" are going to be 5% better, not 5000% better. LLMs are getting better capabilities faster than the benchmarks get better at measuring them accurately.

So, yes, we just aren't going to get anything that's radically better. Just more of the same, and some benchmarks that are less bad. Which is still good. But don't expect a Benchmark Revolution when everyone suddenly realizes just how Abjectly Terrible the current benchmarks are, and gets New Much Better Benchmarks to replace them with. The advances are going to be incremental, unimpressive, and meaningful only in aggregate.

scuff3d · 2025-11-09T07:55:25 1762674925

So because there isn't a better measure it's okay that tech companies effectively lie and treat these benchmarks like they mean more then they actually do?

ACCount37 · 2025-11-09T17:14:17 1762708457

Sorry, pal, but if benchmarks were to disagree with opinions of a bunch of users saying "tech companies bad"? I'd side with benchmarks at least 9 times out of 10.

scuff3d · 2025-11-09T19:25:29 1762716329

How does that have anything to do with what we're talking about?

ACCount37 · 2025-11-10T12:39:19 1762778359

What that has to do is: your "tech companies are bad for using literally the best tool we have for measuring AI capabilities when talking about AI capablities" take is a very bad take.

It's like you wanted to say "tech companies are bad", and the rest is just window dressing.