> deep dives · ml fundamentals
LLM Evaluation
A practical guide to evaluating LLM outputs — metrics, frameworks, LLM-as-judge, and building eval suites that catch real regressions.
- ROUGE
- BERTScore
- LLM-as-Judge
- Ragas
- Human Eval
Prompt
→
LLM Output
→
Reference
→
Judge / Metric
→
Score
Overview
The Big Picture
> Overview — explain the concept from first principles — content coming soon
Core Concepts
Key Ideas to Know
> Core concepts — break down each key idea with diagrams — content coming soon
Deep Dive
How It Actually Works
> Deep dive — step-by-step walkthrough with math or code — content coming soon
Interview Questions
Common Questions & Answers
> Q&A — frequently asked questions with model answers — content coming soon
Gotchas
What Trips People Up
> Gotchas — subtle points, common misconceptions, edge cases — content coming soon