Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where are you getting SWE-Bench Verified scores for 5.2-Codex? AFAIK those have not been published.

And I don't think your Terminal-Bench 2.0 scores are accurate. Per the latest benchmarks: Opus 4.5 is at 59% GPT-5.2-Codex is at 64%

See the charts at the bottom of https://marginlab.ai/blog/swe-bench-deep-dive/ and https://marginlab.ai/blog/terminal-bench-deep-dive/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: