Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DeepSeek v3/r1 isn't based on llama architecture. It uniquely combines and contributes several novel approaches.

Meta never released a mixture of expert model (they failed to train a good one, according to reliable rumors). And MoE is just one of few ingredients that make DeepSeek v3/R1 interesting and good.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: