Machine Heart
Machine Heart
Mar 31, 2026 · Artificial Intelligence

Can LLM Judges Be Trusted? TrustJudge Leverages Full Probability Distributions

LLM judges often produce contradictory scores and non‑transitive preferences; the TrustJudge framework replaces discrete scoring with distribution‑sensitive scoring and likelihood‑aware aggregation, dramatically reducing both score‑comparison and pairwise‑transitivity inconsistencies across multiple model families, improving accuracy and even serving as a reward signal for RL training.

LLM evaluationTrustJudgeinconsistency reduction
0 likes · 12 min read
Can LLM Judges Be Trusted? TrustJudge Leverages Full Probability Distributions