This seems to be rate limited by message not token so the lack of cache may matt...

andhuman · 2025-08-02T06:48:10 1754117290

No it’s by token. The FAQ says this:

> Actual number of messages per day depends on token usage per request. Estimates based on average requests of ~8k tokens each for a median user.

https://cerebras-inference.help.usepylon.com/articles/346886...

jtbayly · 2025-08-02T17:01:26 1754154086

How did you find that? Are you sure it applies to Cerebras Code Pro or Max?

NitpickLawyer · 2025-08-02T05:38:24 1754113104

Yes, but the new "thing" now is "agentic" where the driver is "tool use". So at every point where the LLM decides to make a tool use, there is a new request that gets sent. So a simple task where the model needs to edit one function down the tree, there might be 10 calls - 1st with the task, 2-5 for "read_file", then the model starts writing code, 6-7 trying to run the code, 8 fixing something, and so on...

itsafarqueue · 2025-08-02T06:31:39 1754116299

Yup. If you’ve ever watched a 60+ minute agent loop spawning sub agents, your “one message” prompt leaves you several hundred messages in the hole.

Flux159 · 2025-08-02T01:51:08 1754099468

The lack of caching causes the price to increase for each message or tool call in a chat because you need to send the entire history back after every tool call. Because there isn’t any discount for cached tokens you’re looking at very expensive chat threads.