Boiling frog. The advances are happening so rapidly, but incrementally, that it's not being registered. It just seems like the normal state.
Compare LLMs from a year or two ago with the ones out today on practically any task. It's night and day difference.
This is specially so when you start taking into account these "reasoning" models. It's mind blowing how much better they are than "non-reasoning" models for tasks like planning and coding.
Hmmm I guess it's the way I use them then, because the latest models feel almost less intelligent than the likes of GPT4. Certainly not "night and day" difference from my daily or every other day use case experience. I guess it's probably far more noticeable on benchmarks and far more advanced stuff than I'm using, but I would have assumed that would be the minority and that the majority of people use it similar to how I do.
Compare LLMs from a year or two ago with the ones out today on practically any task. It's night and day difference.
This is specially so when you start taking into account these "reasoning" models. It's mind blowing how much better they are than "non-reasoning" models for tasks like planning and coding.
https://aider.chat/docs/leaderboards/#aider-polyglot-benchma...