Back$ deep dives

> deep dives · ml fundamentals

LLM Evaluation

A practical guide to evaluating LLM outputs — metrics, frameworks, LLM-as-judge, and building eval suites that catch real regressions.

  • ROUGE
  • BERTScore
  • LLM-as-Judge
  • Ragas
  • Human Eval
Prompt
LLM Output
Reference
Judge / Metric
Score

Overview

The Big Picture

> Overview — explain the concept from first principles — content coming soon

Core Concepts

Key Ideas to Know

> Core concepts — break down each key idea with diagrams — content coming soon

Deep Dive

How It Actually Works

> Deep dive — step-by-step walkthrough with math or code — content coming soon

Interview Questions

Common Questions & Answers

> Q&A — frequently asked questions with model answers — content coming soon

Gotchas

What Trips People Up

> Gotchas — subtle points, common misconceptions, edge cases — content coming soon