The Transformer: A Guided Tour
Preface
The Transformer: A Guided Tour
Behold, the transformer, the mechanical brain that powers language models, chatbots and agents. These remarkable systems are scientific marvels that seem poised to do for cognitive work what the combustion engine did for locomotion. They are also famously inscrutable black boxes. They are effectively “grown” instead of built, so we do not understand their inner workings at the level of detail we understand other major inventions such as the internal combustion engine. However, these remarkably complex and versatile systems are just repeatedly applying two alternating “thought-like” operations: attention and inspection. If we understand these two operations, and a handful of other core concepts, we can demystify the transformer. To this end, we use analogies to human thinking and interactive visualizations to build intuition for what these operations are doing and why.
Most of the complexity and barriers to understanding transformers are due to the sheer scale of these systems: huge matrices with billions of parameters operating on very high-dimensional vector spaces. However, by examining these operations in 2D and 3D, they become much easier to understand. Real transformers are doing precisely the same operations, just in a vastly larger space our brains aren’t equipped to visualize. But the math and rules governing these operations are the same in any dimension. We build intuition in the lower dimensional cases, firmly grounding the core concepts, and then scale them up. I believe this approach enables a much deeper understanding of the transformer without needing to understand all the math and technical details. My hope is to demystify AI tools to a general audience, and for researchers, builders, and students of these systems to perhaps consider them from a new perspective and gain a deeper understanding of how they work, what they are doing, and how to build them.
So come along and see the sights on a guided tour of the transformer!
This is a work in progress. I keep changing exactly what I’m trying to say and to whom.
Currently, I’m focused on writing for a general audience, but hopefully it presents a novel perspective that experts will find interesting as well. I’m also working on more technical chapters with heavy math and code and more advanced or technical topics. Or I may continue targeting a general audience and try to explain more advanced topics in a similar way.
Feedback is very welcome!