Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But we're discussing a training technique, that explicitly takes advantage of the continuous (and embedding vs token probability) representations ...

You could quantize a model like this after training, as usual, but that's irrelevant.



The paper title is "Training Large Language Models to Reason in a Continuous Latent Space". It's true it says training in the title, but the goal (reasoning in continuous space) happens at inference time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: