Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Do you disagree with that?

I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.



Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.


Thanks. That was not obvious to me either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: