Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, I am not affiliated with the website, I just want to see more discussions based on uncontaminated benchmarks and feel that people rely too much on benchmarks that companies can conduct themselves. If that is the case, I don't feel I can trust them. For general LLM capabilities, for example, I would also tend to rely on dubesor [1] rather than artificial analysis or similar leaderboards.

[1] https://dubesor.de/benchtable





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: