Thats great feedback - thank you! I would then assume most people would end up using open source models as they tend to be cheaper with faster inference
I can save you a bit of market research and tell you that’s unfortunately not the case yet in the market today. There are a few reasons for it - the main one in my opinion being that it’s hard to measure the value vs cost of switching to an open LLM, so it’s generally perceived as same/lower value, higher cost (not in terms of inference, but in terms of overhead). What is considered however are cost saving options around the foundational models: going for the mini models, prompt caching, batch inference etc. Some tooling in that area might be interesting.
We've actually found the opposite -- Every client project has been based on GPT4 or Gemini, with one exception for a highly sensitive use case based on Llama3.1.
The main reason is that the APIs represent an excellent cost / performance / complexity tradeoff.
Every project has relied primarily on the big models because the small models just aren't as capable in a business context.
We have found that Gpt4o is very fast, when that's necessary (often it's not), and it's also very cheap (gpt4o batch is ~96% cheaper than the original GPT4). And where cost is a concern and reasoning doesn't need to be as good as possible, gpt4o mini has been excellent too.
There are already many prompt/LLM routers available.
We've never found value in them, for similar reasons as mentioned above.