Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Importantly, though, LLMs do not take the embeddings as input during training; they take the tokens and learn the embeddings as part of the training.

Specifically all Transformer-based models; older models used things like word2vec or elmo, but all current LLMs train their embeddings from scratch.



And tokens are now going down to the byte level:

https://ai.meta.com/research/publications/byte-latent-transf...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: