Based on my benchmarks (run 100s of model generations). 2.5 stands between GPT-5...

Based on my benchmarks (run 100s of model generations).

2.5 stands between GPT-5 and GPT-5.1, where GPT-5 is the best of the 3.

In preliminary evals Gemini 3 seems to be way better than all, but I will know when I run extended benchmarks tonight.