Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Comparing multicore wide AVX to CUDA is a bit of an unnecessary nuance for most folks. These make sense, but miss the forest from the trees:

- Either way, you're writing 'cuda style' fine-grained data parallel code that looks and works very different from regular multithreaded code. You are now in a different software universe.

- You now also have to think about throughput, latency hiding, etc. Nvidia has been commoditizing throughput-oriented hardware a lot better than others, and while AMD is catching up on some workloads, Nvidia is already advancing. This is where we think about bandwidth between network/disk=>compute unit. My best analogy here, when looking at things like GPU Direct Storage/Network, is CPU systems feel like a long twisty straw, while GPU paths are fat pipes. Big compute typically needs both compute + IO, and hardware specs tell you the bandwidth ceiling.

To a large extent, ideas are cross-polinating -- CPUs looking more like GPUs, and GPUs getting the flexibility of CPUs -- but either way, you're in a different universe of how code & hardware works than 1990s & early 2000s intel.



Realistically you should use Numpy or Cupy (or whatever the appropriate/fashionable library is) anyway, because tuning this stuff is a big pain.

So, GPUs have the slight disadvantages that you have to think an about data movement and the drivers are a little less convenient to install, but it isn’t really a big deal.


I am a big fan of jax for numerical computations these days.


I’ve been seeing lots of posts about it lately. Haven’t had a chance to try it out, though.


Agreed! The bigger shift is switching to data parallel coding styles.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: