Yup in my private evals I have repeatedly found that DeepSeek has the best models for everything and yet in a lot of these public ones it always seems like someone else is on the top. I don't know why.
If I had to hazard a guess, as a poor soul doomed to maintain several closed and open source models acting agentically, I think you are hyper focused on chat trivia use cases (DeepSeek has a very, very, hard time tool calling and they say as much themselves in their API docs)