Even with something like a 5090, I’d still run Q4_K_S/Q4_K_M because they’re far... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		SamDc73 27 days ago \| parent \| context \| favorite \| on: A guide to local coding models Even with something like a 5090, I’d still run Q4_K_S/Q4_K_M because they’re far more resource-efficient for inference. Also, the 3090 supports NVLink, which is actually more useful for inference speed than native BF16 support. Maybe if you're training bf16 matters?

BoredPositron 27 days ago [–]

That's a smart thing todo considering a 5090 has native tensor cores for 4bit precision...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact