TBH, I'm finding that people underestimate the usefulness of CPU in both inferen...

ramoz · on Dec 11, 2024

I think engineers learn this quickly in high-scale/performance production environments. Even without hardware backgrounds. SLAs/costs create constraints you need to optimize against after promising the business line these magical models can enable that cool new feature for a million users.

Traditional AI/ML models (including smaller transformers) can definitely be optimized for mass scale/performance on cpu-optimized infrastructure.