> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.
For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.
> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.
For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.