Depends on the model, if it doesn't fit into VRAM performance will suffer. Respo...

e12e · on Dec 29, 2024

>> within reasonable cost

> pair of ebay RTX 3090s

So... 1700 USD?

zh3 · on Dec 29, 2024

£1200 UKP, so a little less. Targetted at having 48GB (2x24Gb) VRAM for running the larger models; having said that, a single 12Gb RTX3060 in another box seems pretty close in local testing (with smaller models).

drillsteps5 · on Dec 29, 2024

If you're looking for most bang for the buck 2x3060(12Gb) might be the best bet. GPUs will be around $400-$600.

drillsteps5 · on Dec 29, 2024

Have been trying forever to find a coherent guide on building dual-GPU box for this purpose, do you know of any? Like selecting the MB, the case, cooling, power supply and cables, any special voodoo required to pair the GPUs etc.

zh3 · on Dec 29, 2024

I'm not aware of any particular guides, the setup here was straightforward - an old motherboard with two PCIe X16 slots (Asus P8Z77V or P8Z77WS), a big enough power supply (Seasonic 850W) and the stock linux Nividia drivers. The RTX 3090's are basic Dell models (i.e. not OC'ed gamer versions), and worth noting they only get hot if used continuously - if you're the only one using them, the fans spin up during a query and back down between. Good 'smoke test' is something like 'while 1; do 'ollama run llama3.3 "Explain cosmology"'; done.

With llama3.3 70B, two RTX3090s gives you 48GB of VRAM and the model uses about 44Gb; so the first start is slow (loading the model into VRAM) but after that response is fast (subject to comment above about KEEP_ALIVE).