Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
bakugo
11 months ago
|
parent
|
context
|
favorite
| on:
Questions censored by DeepSeek
The "distilled+quantized versions" are not the same model at all, they are existing models (Llama and Qwen) finetuned on outputs from the actual R1 model, and are not really comparable to the real thing.
raxxor
11 months ago
[–]
That is semantics and they are strongly comparable with their input and output. Distillation is different to finetuning.
Sure, you could say that only running the 600+b model is running "the real thing"...
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: