Silly, if you are going to come up with a new benchmark, then add capable models...

		segmondy 80 days ago \| parent \| context \| favorite \| on: SWE-Bench Pro Silly, if you are going to come up with a new benchmark, then add capable models, they have Opus, Gemini Pro, and then Qwen3-32B. Why not qwen3-coder-480b, qwen3-235b-instruct, deepseek-v3.1, kimi-k2, GLM-4.5, gpt-oss-120b?