> deep dives · ml fundamentals

LLM Evaluation

A practical guide to evaluating LLM outputs — metrics, frameworks, LLM-as-judge, and building eval suites that catch real regressions.

Prompt

→

LLM Output

→

Reference

→

Judge / Metric

→

Score

Overview

The Big Picture

> Overview — explain the concept from first principles — content coming soon

Core Concepts

> Core concepts — break down each key idea with diagrams — content coming soon

Deep Dive

> Deep dive — step-by-step walkthrough with math or code — content coming soon

Interview Questions

> Q&A — frequently asked questions with model answers — content coming soon

Gotchas

> Gotchas — subtle points, common misconceptions, edge cases — content coming soon