> deep dives · ml fundamentals

Transformers & Attention

From the original paper to multi-head attention, positional encoding, and why the architecture took over everything.

Input Tokens

→

Q / K / V

→

Attention Scores

→

Weighted Sum

→

Output

Overview

The Big Picture

> Overview — explain the concept from first principles — content coming soon

Core Concepts

> Core concepts — break down each key idea with diagrams — content coming soon

Deep Dive

> Deep dive — step-by-step walkthrough with math or code — content coming soon

Interview Questions

> Q&A — frequently asked questions with model answers — content coming soon

Gotchas

> Gotchas — subtle points, common misconceptions, edge cases — content coming soon