From the paper > For contexts and models with d_model > n_ctx/12, the context-de...

		vishal0123 on March 2, 2023 \| parent \| context \| favorite \| on: Introducing ChatGPT and Whisper APIs From the paper > For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute. For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.