If you're just chatting with it starting with "Hi", that's correct. The conversa...

pclmulqdq · 2025-06-02T00:14:07 1748823247

Usually, when people think about the prompt tokens for a chat model, the initial system prompt is the vast majority of the tokens and it's the same regardless for many usage modes. You might have a slightly different system prompt for code than you have for English or for chatting, but that is 3 prompts which you can permanently put in some sort of persistent KV cache. After that, only your specific request in that mode is uncached.