Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a huge step to I'm capability with 16gb and 24gb, for not to much more. The 4060 has a 16gb version, for example. On the the cheap end, the Intel Arc does too.

Next major step up is 48GB and then hundreds of GB. But a lot of ML models target 16-24gb since that's in the grad student price range.



At the 48GB level, L40S are great cards and very cost effective. If you aren’t aiming for constant uptime on several >70B models at once, they’re for sure the way to go!


> L40S are great cards and very cost effective

from https://www.asacomputers.com/nvidia-l40s-48gb-graphics-card....

nvidia l40s 48gb graphics card Our price: $7,569.10*

Not arguing against 'great', but cost efficiency is questionable. for 10% you can get two used 3090. The good thing about LLMs is they are sequential and should be easily parallelized. Model can be split in several sub-models, by the number of GPUs. Then 2,3,4.. GPUs should improve performance proportionally on big batches, and make it possible to run bigger model on low end hardware.


Dual 3090s are way cheaper than the l40s though. You can even buy a few backups.


Yeah, I’m specifically responding to the parent’s comment about the 48GB tier. When you’re looking in that range, it’s usually because you want to pack in as much vram as possible into your rack space, so consumer level cards are off the table. I definitely agree multiple 3090 is the way to go if you aren’t trying to host models for smaller scale enterprise use, which is where 48GB cards shine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: