Ask HN: What are the drawbacks of caching LLM responses?

gwern · on March 15, 2024

Two major ones: you now need to handle all cache issues like invalidation (what happens when you want to upgrade or the model improves?), and you also now need to think about security issues - given the drastic timing differences, anyone can probe your cache to figure out what calls have been made and extract anything in prompts like passwords or PII (eg. just going token by token and trying the top possibilities each time).

XCSme · on March 15, 2024

> happens when you want to upgrade or the model improves

I was thinking to prefix the key of each inquiry with model_gpt3.5-1000 or whatever model returned that.

> anyone can probe your cache to figure out what calls have been made

My use-case is local-only[0], where each user sends the requests from their own machine. I could maybe cache by default and add some info showing the request was returned from cache, alongside a button to "force regenerate answer".

[0]: https://docs.uxwizz.com/guides/ask-ai-new

XCSme · on March 15, 2024

Just found this: https://github.com/zilliztech/GPTCache which seems to address this idea/issue.