Machine Heart
Mar 31, 2026 · Artificial Intelligence
Can LLM Judges Be Trusted? TrustJudge Leverages Full Probability Distributions
LLM judges often produce contradictory scores and non‑transitive preferences; the TrustJudge framework replaces discrete scoring with distribution‑sensitive scoring and likelihood‑aware aggregation, dramatically reducing both score‑comparison and pairwise‑transitivity inconsistencies across multiple model families, improving accuracy and even serving as a reward signal for RL training.
LLM evaluationTrustJudgeinconsistency reduction
0 likes · 12 min read
