Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

+1

There are already many prompt/LLM routers available.

We've never found value in them, for similar reasons as mentioned above.



Thats great feedback - thank you! I would then assume most people would end up using open source models as they tend to be cheaper with faster inference


I can save you a bit of market research and tell you that’s unfortunately not the case yet in the market today. There are a few reasons for it - the main one in my opinion being that it’s hard to measure the value vs cost of switching to an open LLM, so it’s generally perceived as same/lower value, higher cost (not in terms of inference, but in terms of overhead). What is considered however are cost saving options around the foundational models: going for the mini models, prompt caching, batch inference etc. Some tooling in that area might be interesting.


We've actually found the opposite -- Every client project has been based on GPT4 or Gemini, with one exception for a highly sensitive use case based on Llama3.1.

The main reason is that the APIs represent an excellent cost / performance / complexity tradeoff.

Every project has relied primarily on the big models because the small models just aren't as capable in a business context.

We have found that Gpt4o is very fast, when that's necessary (often it's not), and it's also very cheap (gpt4o batch is ~96% cheaper than the original GPT4). And where cost is a concern and reasoning doesn't need to be as good as possible, gpt4o mini has been excellent too.


Thank you again - Great context for me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: