Simply focusing on the "better in every regard" part of the comment.
One example where Cerebras systems perform well is when a user is interested in training models that require long sequence lengths or high-resolution images.
One example is in this publication, https://www.biorxiv.org/content/10.1101/2022.10.10.511571v2, where researchers were able to build genome-scale language models that can learn the evolutionary landscape of SARS-CoV-2 genomes. In the paper mentions, researchers mention "We note that for the larger model sizes (2.5B and 25B), training on the 10,240
length SARS-CoV-2 data was infeasible on GPU clusters due to
out-of-memory errors during attention computation."
One example where Cerebras systems perform well is when a user is interested in training models that require long sequence lengths or high-resolution images.
One example is in this publication, https://www.biorxiv.org/content/10.1101/2022.10.10.511571v2, where researchers were able to build genome-scale language models that can learn the evolutionary landscape of SARS-CoV-2 genomes. In the paper mentions, researchers mention "We note that for the larger model sizes (2.5B and 25B), training on the 10,240 length SARS-CoV-2 data was infeasible on GPU clusters due to out-of-memory errors during attention computation."