Defining a Good Answer in the Agent Era: A Rubrics Survey
This survey examines how rubrics—structured, multi‑dimensional evaluation criteria—are defined, constructed, and applied to train and evaluate large language models, especially for open‑ended, high‑risk and agentic tasks, while highlighting current challenges such as reward hacking and bias.
