>"It doesn't matter, the real benchmark is taking the community temperature on t...

>"It doesn't matter, the real benchmark is taking the community temperature on the model after a few weeks of usage."

Indeed. It's almost impossible to truly know a model before spending a few million tokens on a real world task. It will take a step-change level advancement at this point for me to trust anything but Claude right now.