Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's because they are as close to "object measure capabilities" as anything we're ever going to get.

Without benchmarks, you're down to evaluating model performance based on vibes and vibes only, which plain sucks. With benchmarks, you have numbers that correlate to capabilities somewhat.



That's assuming these benchmarks are the best we're ever going to get, which they clearly aren't. There's a lot to improve even without radical changes to how things are done.


The assumption I make is that "better benchmarks" are going to be 5% better, not 5000% better. LLMs are getting better capabilities faster than the benchmarks get better at measuring them accurately.

So, yes, we just aren't going to get anything that's radically better. Just more of the same, and some benchmarks that are less bad. Which is still good. But don't expect a Benchmark Revolution when everyone suddenly realizes just how Abjectly Terrible the current benchmarks are, and gets New Much Better Benchmarks to replace them with. The advances are going to be incremental, unimpressive, and meaningful only in aggregate.


So because there isn't a better measure it's okay that tech companies effectively lie and treat these benchmarks like they mean more then they actually do?


Sorry, pal, but if benchmarks were to disagree with opinions of a bunch of users saying "tech companies bad"? I'd side with benchmarks at least 9 times out of 10.


How does that have anything to do with what we're talking about?


What that has to do is: your "tech companies are bad for using literally the best tool we have for measuring AI capabilities when talking about AI capablities" take is a very bad take.

It's like you wanted to say "tech companies are bad", and the rest is just window dressing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: