Full disclosure, we have a contract with AMD to get Llama 405B training on MI350X on MLPerf.
Things are turning around for AMD. If you have an AMD card, go to pytorch.org, click Linux+ROCm and install PyTorch. 3 years ago, this was hopeless. Today, most mainline things work. I ran nanochat on MI300X and it just worked. I think that's true about MI350X now too. The MI350X machine is stable.
They are clearly behind NVIDIA, nobody doubts that. And a lot of investment into software will be required to catch up, ecosystem, compiler, and driver. But 2 years ago they seemed hopeless, now they don't. Things take time. HipKittens is a great codebase to study to see where AMD's LLVM backend is still lacking; compare it to the CUDA Kittens.
For training, it's NVIDIA and Google in first. AMD in second. And nobody in third. Intel and Tenstorrent are not remotely close. Huawei examples segfaulted. Groq gave up selling chips. Cerebras isn't available anywhere. Trainium had a 5 day wait time to get one instance and I lost interest.
As CEO of an AMD NeoCloud for the past 2 years, it is so nice to hear all this and also see the turn around. It is what I bet my business on from the start and I can concur with what George is saying 100%.
The out of box experience can be a bit rough around the edges on bleeding edge stuff, but it isn't anything near as bad as it used to be. For example, a month ago nanochat wasn't working well and now it is. The important thing is that people now care enough to make it work.
At the end of the day, AI does need viable options. Having a monopoly on all AI hardware and software might be a good thing for share holders, but isn't a good thing for what is looking like a fundamental technology, akin to the internet.
That’s interesting, I was specifically looking for AMD hardware being offered by neoclouds, they seem to be rare.
I like your bet though. The difference between NVDA and AMD has never really existed on a hardware level for decades. AMD has always been on par, and software is software, it will catch up.
AMD will be a stock many people will miss because the opportunity has presented itself at the height of AI bubble talk, and this will leave many in the dust. Doubling and tripling of their market cap is pretty much a forgone conclusion.
You're right, it is a much smaller ecosystem, but I think that is partly intentional as a way to focus efforts and not feed into the bubble, which I feel is a smart move. These are the official partners [0]. I'm Hot Aisle.
George was very smart, $500k in the $90's. I saw it coming even earlier than him, but that's cause I was already aware the hardware was good from my own experiences.
Will it catch up or will it forever chase nvidia's tail? I'm betting on the latter unless another AI winter happens. And contrary to anti-generative AI social media talking points, the literature suggests The Red Queen's race is continuing apace IMO.
Nvidia remains undefeated at responding to hardware threats with hardware diving catches to this day. What scenario prevents them from yet another one of their diving catches? I'm genuinely curious as to how one could pull that off. It's like challenging Google in search: even if you deliver better product and some have, the next thing you know Google is doing the same thing or better with deeper pockets.
Nvidia remains undefeated at responding to hardware threats with hardware diving catches to this day. What scenario prevents them from yet another one of their diving catches?
The fact that they make roughly the same hardware as AMD for the last 2 decades, and even today. There was no diving catch, AMD just ignored what the hardware was capable of and didn't reinforce OpenCL. There was literally no diving catch. For example, just in this thread alone, AMD paid someone to make this shit work on their hardware. Don't bet against what's coming.
Except no, AMD 100% played follow the leader with technology like CUDA, NVLink, and tensor cores.
Even paying paying someone in academia to get s** to work on their hardware is yet another example of follow the leader.
What exactly do you think is coming? I think the biggest threat is one or more Chinese companies catching up on both hardware and ecosystem in the next half decade or so myself, mostly because of the state level support for making that so. But I absolutely don't expect an x86_64 moment for GPUs here given past results and the current bias against software in AMD's HW culture. Convince me otherwise.
1 and 2 are supported, 1 you need to specify, 2 will be found with BEAM. We are working on reimplementing HipKittens in tinygrad, all the stuff is there to do it. See the amd_uop_matmul example.
tinygrad doesn't support 3 yet, it's not needed on any AMD GPUs, and not needed on NVIDIA consumer. It wouldn't be hard to add, but it's important to figure out how it best fits with the existing abstractions. I think everything will eventually move to a more producer-consumer model.
Right now AI support on AMD is officially only on specific models. But they are working hard to turn this around to have broader support. And making progress.
Vulkan compute is also getting some good press as a local llm platform (at least on the linux side), will be interesting to see which crosses the line to "can ship production quality apps on this" first.
Nope! Works fine with in-tree somewhat recent kernel. The AMD driver is actually open source, not just a wrapper into a big on device blob like the NVIDIA one. tinygrad also has a driver that doesn't even need the kernel module, just mmapping the PCIe BAR into Python.
Things are turning around for AMD. If you have an AMD card, go to pytorch.org, click Linux+ROCm and install PyTorch. 3 years ago, this was hopeless. Today, most mainline things work. I ran nanochat on MI300X and it just worked. I think that's true about MI350X now too. The MI350X machine is stable.
They are clearly behind NVIDIA, nobody doubts that. And a lot of investment into software will be required to catch up, ecosystem, compiler, and driver. But 2 years ago they seemed hopeless, now they don't. Things take time. HipKittens is a great codebase to study to see where AMD's LLVM backend is still lacking; compare it to the CUDA Kittens.
For training, it's NVIDIA and Google in first. AMD in second. And nobody in third. Intel and Tenstorrent are not remotely close. Huawei examples segfaulted. Groq gave up selling chips. Cerebras isn't available anywhere. Trainium had a 5 day wait time to get one instance and I lost interest.