Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>"It doesn't matter, the real benchmark is taking the community temperature on the model after a few weeks of usage."

Indeed. It's almost impossible to truly know a model before spending a few million tokens on a real world task. It will take a step-change level advancement at this point for me to trust anything but Claude right now.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: