Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Show HN: Scorecard – Evaluate LLMs like Waymo simulates cars (scorecard.io)
7 points by Rutledge 66 days ago | past
Agenteval.org: An Open-Source Benchmarking Initiative for AI Agent Evaluation (scorecard.io)
6 points by Rutledge 9 months ago | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: