Yes, but the new "thing" now is "agentic" where the driver is "tool use". So at every point where the LLM decides to make a tool use, there is a new request that gets sent. So a simple task where the model needs to edit one function down the tree, there might be 10 calls - 1st with the task, 2-5 for "read_file", then the model starts writing code, 6-7 trying to run the code, 8 fixing something, and so on...
The lack of caching causes the price to increase for each message or tool call in a chat because you need to send the entire history back after every tool call. Because there isn’t any discount for cached tokens you’re looking at very expensive chat threads.