Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For comparison, Groq [1] has (price per million tokens of input vs output):

    Llama 2 70B (4096 Context Length)     ~300 tokens/s $0.70/$0.80
    Llama 2 7B (2048 Context Length)     ~750 tokens/s $0.10/$0.10
    Mixtral, 8x7B SMoE (32K Context Length) ~480 tokens/s $0.27/$0.27
    Gemma 7B (8K Context Length)         ~820 tokens/s $0.10/$0.10
[1] https://wow.groq.com/


And zero capacity. Groq is coming across a total paper tiger. No billing, unusable rate limits, and most importantly: a request queue that makes it dramatically slower than any other option.

They say they're just waiting on implementing billing, but at this point it reads more like "we wouldn't be able to meet demand of all your request usages".

-

Groq is going through all that to offer 500tk/s theoretically, meanwhile I'm seeing Fireworks.ai come in at 300+tk/s in production use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: