It helps to be able to run the model locally, and currently this is slow or expensive. The challenges of running a local model beyond say 32B are real.
I would be fine though with like 10 times the wait time. But I guess consumer hardware need some serius 'ram pipeline' upgrade for big models to be run at crawl speeds.
I really don't want my querries to leave my computer, ever.
It is quite surreal how this 'open weights' model get so little hype.