Where are you getting SWE-Bench Verified scores for 5.2-Codex? AFAIK those have ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		qwesr123 22 days ago \| parent \| context \| favorite \| on: GPT-5.2-Codex Where are you getting SWE-Bench Verified scores for 5.2-Codex? AFAIK those have not been published. And I don't think your Terminal-Bench 2.0 scores are accurate. Per the latest benchmarks: Opus 4.5 is at 59% GPT-5.2-Codex is at 64% See the charts at the bottom of https://marginlab.ai/blog/swe-bench-deep-dive/ and https://marginlab.ai/blog/terminal-bench-deep-dive/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact