Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup in my private evals I have repeatedly found that DeepSeek has the best models for everything and yet in a lot of these public ones it always seems like someone else is on the top. I don't know why.


Publishing them might help you find out.


^ This.

If I had to hazard a guess, as a poor soul doomed to maintain several closed and open source models acting agentically, I think you are hyper focused on chat trivia use cases (DeepSeek has a very, very, hard time tool calling and they say as much themselves in their API docs)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: