I agree that Karpathy's YouTube video is an excellent resource for understanding Transformers from scratch. It provides a hands-on experience that can be particularly helpful for those who want to implement the models themselves. Here's the link to the video titled "Let's build GPT: from scratch, in code, spelled out": https://youtu.be/kCc8FmEb1nY
Additionally, for more comprehensive resources on Transformers, you may find these resources useful:
I endorse all of this and will further endorse (probably as a follow-up once one has a basic grasp) "A Mathematical Framework for Transformer Circuits" which builds a lot of really useful ideas for understanding how and why transformers work and how to start getting a grasp on treating them as something other than magical black boxes.
Additionally, for more comprehensive resources on Transformers, you may find these resources useful:
* The Illustrated Transformer by Jay Alammar: http://jalammar.github.io/illustrated-transformer/
* MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention: https://www.youtube.com/watch?v=ySEx_Bqxvvo
* Karpathy's course, Deep Learning and Generative Models (Lecture 6 covers Transformers): https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs......
These resources cover different aspects of Transformers and can help you grasp the underlying concepts and mechanisms better.