> deep dives · ml fundamentals
Transformers & Attention
From the original paper to multi-head attention, positional encoding, and why the architecture took over everything.
- Self-Attention
- Multi-Head
- Positional Encoding
- Softmax
- Scaled Dot-Product
Input Tokens
→
Q / K / V
→
Attention Scores
→
Weighted Sum
→
Output
Overview
The Big Picture
> Overview — explain the concept from first principles — content coming soon
Core Concepts
Key Ideas to Know
> Core concepts — break down each key idea with diagrams — content coming soon
Deep Dive
How It Actually Works
> Deep dive — step-by-step walkthrough with math or code — content coming soon
Interview Questions
Common Questions & Answers
> Q&A — frequently asked questions with model answers — content coming soon
Gotchas
What Trips People Up
> Gotchas — subtle points, common misconceptions, edge cases — content coming soon