Out of curiosity I've repeatedly compared the tokens/sec of various open weight models and consistently come up with this: tokens/sec/USD is near constant.
If a $4,000 Mac does something at X tok/s, a $400 AMD PC on pure CPU does it at 0.1*X tok/s.
Assuming good choices for how that money is spent. You can always waste more money. As others have said, it's all about memory bandwidth. AMD's "AI Max+ 395" is gonna make this interesting.
And of course you can always just not have enough RAM to even run the model. This tends to happen with consumer discrete GPUs not having that much VRAM, they were built for gaming.
If a $4,000 Mac does something at X tok/s, a $400 AMD PC on pure CPU does it at 0.1*X tok/s.
Assuming good choices for how that money is spent. You can always waste more money. As others have said, it's all about memory bandwidth. AMD's "AI Max+ 395" is gonna make this interesting.
And of course you can always just not have enough RAM to even run the model. This tends to happen with consumer discrete GPUs not having that much VRAM, they were built for gaming.