DeepSeek v3/r1 isn't based on llama architecture. It uniquely combines and contributes several novel approaches.
Meta never released a mixture of expert model (they failed to train a good one, according to reliable rumors). And MoE is just one of few ingredients that make DeepSeek v3/R1 interesting and good.
Meta never released a mixture of expert model (they failed to train a good one, according to reliable rumors). And MoE is just one of few ingredients that make DeepSeek v3/R1 interesting and good.