Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looking at the top end H100 80GB systems with NVLink from HPC vendors and it occurred to me we are about to swing back to massive almost mainframe like form-factor systems, giant bus, like old expandable qbus in the 80s but this time for GPUs.

What I mean is they have systems with 8x cards but given the compute requirements of these huge LLMs probably systems with 32+ all on a dedicated memory bus (NVLink) are what will be needed as weights sizes expand. This is all for inference btw, not even training, but same hold for training probably best possible interconnect between same monster systems.

I’m dreaming there might be a distributed eventually consistent partial training algorithm then that would democratize creation of these models.

In regards to smaller scale individual systems for inference, if one has resources and is fairly technical and can utilize such technology then perhaps in 5-10 years the wealthy might buy units for $50K+ that get installed in their home or something.

Really incredible developments very quickly. Apologies for the potentially inappropriately long rant to the previous comment.



The other possibility is that we'll get cards that are very specifically designed just for the LLMs, basically ditching everything that is not strictly necessary for the sake of squeezing more compute / VRAM, and perhaps optimizing around int4/int8 (the latter is apparently "good enough" for training?).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: