The issue is that you are taking max GPU power draw, as a given. Running a LLM does not tax a GPU the same way a game does. There is a rather know Youtuber, that ran LLMs on a 4090, and the actual power draw was only 130W on the GPU.
Now add that this guy has 7x3060 = 100% miner. So you know that he is running a optimized profile (underclocked).
Fyi, my gaming 6800 draws 230W, but with a bit of undervolting and sacrificing 7% performance, it runs at 110W for the exact same load. And that is 100% taxed. This is just a simple example to show that a lot of PC hardware runs very much overclocked/unoptimized out of the box.
Somebody getting down to 520W sounds perfectly normal, for a undervolted card that gives up maybe 10% performance, for big gains in power draw.
And no, old hardware can be extreme useful in the right hands. Add to this, its the main factor that influences the speed tends to be more memory usage (the more you can fit and the interconnects), then actual processing performance for running a LLM.
Being able to run a large model for 1600 sounds like a bargain to me. Also, remember, when your not querying the models, the power will be mostly the memory wakes + power regulators. Coming back to that youtuber, he was not constantly drawing that 130W, it was only with spikes when he ran prompts or did activity.
Yes, running from home will be more expensive then a 10$ copilot plan but ... nobody is also looking at your data ;)
Thanks for the clarification. Surely, If I run hashcat benchmark the power consumption goes nearly to 1400 Watt, but I also limited the max power consumption for each card to 100 Watt, which worked out better than limiting the max gpu frequency. To be fair, the most speed comes from the RAM frequency - as long as this is not limited, it works out great.
I took a fair amount of time to get everything to a reduced power level and measured several llm models (and hashcat for the extreme) to find the best speed per watt, which is usally around 1700-1900 mhz or limiting 3060 to 100 to 115 watt.
If I planned it in the first run, I may got away with a used mac studio, thats right. However, I incrementally added more cards as I moved further into exploration.
I didn't wanted to confront someone, but it looks like you either show of 4x 4090 or you keep silent
I am amazed these days people lacking knowledge about hardware, and the mass benefits of undervolting/power limiting hardware. Its like people do not realize that what is sold, is often overclocked/too high vcore. The amount of people i see buying insane overspec PSUs, and go O_o ...
How is your performance with the different models on your setup?
"Undervolting" is a thing for 3090s where they get them down from 350 to 300W at 5% perf drop but for your case it's irrelevant because your lane budget is far too little!
> know Youtuber, that ran LLMs on a 4090, and the actual power draw was only 130W on the GPU.
Well, let's see his video. He must be using some really inefficient backend implementation if the GPU wasn't utilised like that.
I'm not running e-waste. My cards are L40S and even in basic inference, no batching with ggml cuda kernels they get to 70% util immediately.
Now add that this guy has 7x3060 = 100% miner. So you know that he is running a optimized profile (underclocked).
Fyi, my gaming 6800 draws 230W, but with a bit of undervolting and sacrificing 7% performance, it runs at 110W for the exact same load. And that is 100% taxed. This is just a simple example to show that a lot of PC hardware runs very much overclocked/unoptimized out of the box.
Somebody getting down to 520W sounds perfectly normal, for a undervolted card that gives up maybe 10% performance, for big gains in power draw.
And no, old hardware can be extreme useful in the right hands. Add to this, its the main factor that influences the speed tends to be more memory usage (the more you can fit and the interconnects), then actual processing performance for running a LLM.
Being able to run a large model for 1600 sounds like a bargain to me. Also, remember, when your not querying the models, the power will be mostly the memory wakes + power regulators. Coming back to that youtuber, he was not constantly drawing that 130W, it was only with spikes when he ran prompts or did activity.
Yes, running from home will be more expensive then a 10$ copilot plan but ... nobody is also looking at your data ;)