A token per second-ish with a Ryzen 7 5800X. If I run it for too long it gets slower as heat throttling kicks in, I need a better cooling system if I'm going to run it non-stop.
Really fast. I didn't bother timing, but they're faster than ChatGPT by a long shot. I didn't spend very long with them because the quality is so much worse than the 65B.
I should probably go back and try again to see if it's worth it for the extra speed, now that I've played with 65B for a while.
Good question--I'm counting it from empty and around empty. By the time it gets to full I'm also getting heat throttling (I can tell looking at the temp), so it's hard to know the degree to which the slowdown is one or the other.