+1 There are already many prompt/LLM routers available. We've never found value ...

k11kirky · on Oct 10, 2024

Thats great feedback - thank you! I would then assume most people would end up using open source models as they tend to be cheaper with faster inference

jneagu · on Oct 14, 2024

I can save you a bit of market research and tell you that’s unfortunately not the case yet in the market today. There are a few reasons for it - the main one in my opinion being that it’s hard to measure the value vs cost of switching to an open LLM, so it’s generally perceived as same/lower value, higher cost (not in terms of inference, but in terms of overhead). What is considered however are cost saving options around the foundational models: going for the mini models, prompt caching, batch inference etc. Some tooling in that area might be interesting.

hypoxia87 · on Oct 10, 2024

We've actually found the opposite -- Every client project has been based on GPT4 or Gemini, with one exception for a highly sensitive use case based on Llama3.1.

The main reason is that the APIs represent an excellent cost / performance / complexity tradeoff.

Every project has relied primarily on the big models because the small models just aren't as capable in a business context.

We have found that Gpt4o is very fast, when that's necessary (often it's not), and it's also very cheap (gpt4o batch is ~96% cheaper than the original GPT4). And where cost is a concern and reasoning doesn't need to be as good as possible, gpt4o mini has been excellent too.

k11kirky · on Oct 11, 2024

Thank you again - Great context for me