Tagged articles
2 articles
Page 1 of 1
AI Tech Publishing
AI Tech Publishing
Jan 10, 2026 · Artificial Intelligence

Anthropic Engineers Reveal a Pragmatic Framework for Evaluating AI Agents

Anthropic engineers outline why rigorous AI Agent evaluation is essential, describe a comprehensive evaluation harness with tasks, trials, graders, and transcripts, compare capability and regression tests, discuss code-, model-, and human-based graders, and present an eight-step roadmap for building reliable Agent assessment pipelines.

AI AgentCapability EvaluationCode-based Grader
0 likes · 12 min read
Anthropic Engineers Reveal a Pragmatic Framework for Evaluating AI Agents
DataFunSummit
DataFunSummit
Jan 13, 2023 · Artificial Intelligence

2022 Digital Human System Basic Capability Evaluation and Observations

This report presents the background, methodology, evaluation model, results, and key observations of the 2022 digital human system basic capability assessment, highlighting technical, engineering, and security challenges, industry standards development, and future work to advance digital human technologies.

Artificial IntelligenceCapability EvaluationDigital Human
0 likes · 12 min read
2022 Digital Human System Basic Capability Evaluation and Observations