> Do you disagree with that? I think that Qwen3 8B and 4B are SOTA for their siz... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		tarruda 19 days ago \| parent \| context \| favorite \| on: Mistral 3 family of models released > Do you disagree with that? I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.

meatmanek 19 days ago [–]

Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.

kergonath 19 days ago | [–]

Thanks. That was not obvious to me either.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact