Lv.1 0 XP

Confidence Calibration — Measuring and Correcting Self-Reported Certainty

Core 6 min +25 XP
💡
THE ANALOGY

A weather forecast service that says '80% chance of rain' every time it's uncertain. If it only rains 50% of those times, the forecast is miscalibrated — 80% confidence should mean 80% accuracy. Calibrating Claude's confidence means measuring whether 'high confidence' actually corresponds to high accuracy.

⚠️ EXAM TRAP — The Wrong Answer People Choose

Trusting self-reported confidence without calibration. Claude may consistently over-estimate or under-estimate its certainty on specific task types. Calibration reveals and corrects these systematic biases.

KEY POINTS
1 Calibration: measure actual accuracy per confidence tier using human-verified ground truth.
2 Overconfidence: Claude reports 'high' but accuracy is only 75% — adjust routing to treat 'high' as 'medium'.
3 Underconfidence: Claude reports 'low' but accuracy is 90% — adjust routing to treat 'low' as 'medium'.
4 Task-specific calibration: calibration varies by document type, language, and domain — calibrate per task type.
5 Stratified sampling: sample proportionally from each confidence tier to measure calibration.