Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Silly, if you are going to come up with a new benchmark, then add capable models, they have Opus, Gemini Pro, and then Qwen3-32B.

Why not qwen3-coder-480b, qwen3-235b-instruct, deepseek-v3.1, kimi-k2, GLM-4.5, gpt-oss-120b?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: