I run it at home at q8 on my dual Epyc server. I find it to be quite good, especially when you host it locally and are able to tweak all the settings to get the kind of results you need for a particular task.
It helps to be able to run the model locally, and currently this is slow or expensive. The challenges of running a local model beyond say 32B are real.
I would be fine though with like 10 times the wait time. But I guess consumer hardware need some serius 'ram pipeline' upgrade for big models to be run at crawl speeds.