It goes way beyond that. Transformers are the first practical, scalable, general-purpose differentiable (i.e. trainable with gradient descent) algorithm. We haven't come close to seeing the limits of what they can do, because everything so far points to the fact that their only limit is our current hardware. And hardware is improving at a much faster and steadier rate than algorithms in computer science these days.
everything? they've solved reinforcement learning? they can handle continuous domains, like robot motion? that's funny, I thought they could only handle sequences of tokens.*
yes, they're exciting, and they are the most general architecture we've found so far, but there are important problems in AI (like anything continuous), that they're really not suited for.
I think there's better architectures out there for many tasks, and I'm a little dismayed that everyone seems to be cargo-culting the GPT architecture rather than taking the lessons for transformers and experimenting with more specialized algorithms.
*btw they don't need quantized tokens, there's no reason they can't just work on continuous vectors directly, and they don't have to be causal or limited to one sequence, but "transformer" seems to mean GPT in everyone's mind, and even though the original transformer was an encoder-decoder model we rarely seem to see those these days for some reason.
There's two ways you can solve currently intractable problems: Find better algorithms or improve the hardware. It's actually insanely hard to come up with new algorithms, that is why machine learning and AI were lagging behind most of computer science for decades.