Does anyone have resources for a good way to get started with this sort of moder...

jahewson · on July 17, 2023

If you’d like a practical goal, you probably want to learn PyTorch and have a little background knowledge of the memory architecture of the GPUs. If you want to go deep, learn CUDA: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....

whimsicalism · on July 17, 2023

Yes, I know pytorch well at this point and have basic memory architecture understanding. In the process of learning CUDA, but would love pointers for depth/intermediate things to explore.

jahewson · on July 17, 2023

I found this talk helpful. https://on-demand.gputechconf.com/gtc/2017/presentation/s712...

Have you tried the Visual Profiler yet?

luckyt · on July 17, 2023

I found it helpful to start with CUDA on numba since it lets you write GPU kernels in python. Assuming you're like most ML engineers and you're more familiar with python than C++, this allows you to separately learn CUDA concepts from also learning C++ at the same time. There's also a set of GPU puzzles for beginners [1] using to get started with numba CUDA.

[1] https://github.com/srush/GPU-Puzzles

whimsicalism · on July 17, 2023

Thanks for the link! Sasha is actually my former professor - if this is anything like his past pytorch puzzles I'm sure I'll find it enjoyable.

brrrrrm · on July 17, 2023

I'd start with the example of implementing the fastest reduction you possibly can. Pretty much all complexity in every kernel used in ML extends from this concept (reductions with addition).

https://developer.download.nvidia.com/assets/cuda/files/redu...

FL33TW00D · on July 18, 2023

I love this PDF, some crazy sticking power since he's referring to a G80 GPU there!

whimsicalism · on July 17, 2023

thank you for the suggestion - will take a look!