You don't need concrete numbers or even napkin math to realize that a gaming computer running a GPU for a couple of hours in the evening is going to use much less energy than a GPU running maxed out 24/7 for AI
There's nothing irrational about suggesting AI GPUs are consuming far more power
Yes, one AI GPU uses more energy than one GPU used for gaming. The one AI GPU however is shared between a large number of users, while the one gaming GPU is used by one player.
Apparently a single gaming GPU can be used to run an LLM that serves hundreds of concurrent requests.
> Benchmarking Llama 3.1 8B (fp16) on our 1x RTX 3090 instance suggests that it can support apps with thousands of users by achieving reasonable tokens per second at 100+ concurrent requests.
But that's a tiny model; it's the smallest version of Llama 3.1. The commercially marketed models are way bigger - e.g. GPT-4 has been estimated to use about 1.76 trillion parameters, 220 times more than the Llama build you mentioned. Their resource and performance requirements are vastly different.
You're essentially arguing that shipping naval diesel aggregates must be trivial because you can fit a dozen moped motors on the bed of your pickup truck just fine.
Okay but these tiny models are being used by people and businesses instead of GPT-4. My point was that they consume less energy per user than a rig used for gaming.
I have no insight into how many GPT-4 users are served per GPU, but I would assume OpenAI heavily optimizes for that, considering the cost to run that thing. It's probably in the same ballpark: hundreds-thousands of concurrent user requests per GPU. Still better than one GPU per gamer, even if it requires 10x the energy.
I don’t claim to know, but we ought to be able to have a rational debate on this.