Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the paper

> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.

For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: