Tagged articles
14 articles
Page 1 of 1
dbaplus Community
dbaplus Community
May 16, 2026 · Artificial Intelligence

Can Your AI Skill Pass? An 8‑Dimension Quantitative Evaluation Framework

This article introduces an eight‑dimension quantitative framework for assessing AI Skills, detailing each metric—from metadata quality to scope focus—explaining weighted scoring, demonstrating evaluations on real Skills and comparative cases, and presenting a multi‑model cross‑validation process with four execution strategies to turn subjective judgments into measurable grades.

AI SkillEvaluation FrameworkExecution Strategy
0 likes · 16 min read
Can Your AI Skill Pass? An 8‑Dimension Quantitative Evaluation Framework
AI Tech Publishing
AI Tech Publishing
Apr 27, 2026 · Artificial Intelligence

Why Build Your Own AI Evaluation Harness? 7 OpenAI‑Inspired Recommendations

The article explains why generic AI testing platforms fall short, outlines how to design a testable AI system from day one, and presents seven practical recommendations—from using Codex or Claude Code to manage regression and iteration test sets, to leveraging entropy diagnostics and custom domain‑expert UX.

AI EvaluationEvaluation FrameworkOpenAI
0 likes · 8 min read
Why Build Your Own AI Evaluation Harness? 7 OpenAI‑Inspired Recommendations
PMTalk Product Manager Community
PMTalk Product Manager Community
Apr 14, 2026 · Product Management

Why Evaluation and Decomposition, Not Prototyping, Are the Core Skills for AI Product Managers

Traditional product tactics like building features first and relying on gradual rollout no longer work for AI agents; instead, AI product managers must adopt a rigorous, scenario‑driven evaluation framework that measures result quality, task completion, tool correctness, and security to ensure trustworthy, business‑critical performance.

AI product managementAI reliabilityAgent AI
0 likes · 10 min read
Why Evaluation and Decomposition, Not Prototyping, Are the Core Skills for AI Product Managers
AI Step-by-Step
AI Step-by-Step
Mar 28, 2026 · Artificial Intelligence

How to Evaluate Agent Performance Across Different Scenarios

The article proposes a four‑dimensional framework—task result, output structure, behavior boundary, and long‑term stability—to systematically validate AI agents in varied business contexts such as e‑commerce, manufacturing, insurance, and HR, emphasizing concrete evidence over subjective impressions.

AI AgentEvaluation FrameworkR&D management
0 likes · 10 min read
How to Evaluate Agent Performance Across Different Scenarios
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 19, 2026 · Artificial Intelligence

From Language Modeling to World Modeling: Limits of Large Language Models

Speaker Li Yixia from Southern University of Science and Technology presents a talk on using large language models as textual world models, defining a three‑layer evaluation framework and showing through experiments that fine‑tuned models improve next‑state prediction and agent performance, yet face limits tied to behavior coverage and environment complexity.

Evaluation Frameworkagent performancelarge language models
0 likes · 4 min read
From Language Modeling to World Modeling: Limits of Large Language Models
Youzan Coder
Youzan Coder
Jan 13, 2026 · Artificial Intelligence

From Hackathon to Scalable AI Customer Service: Lessons and Best Practices

This article chronicles the end‑to‑end development of an AI‑driven customer service system, detailing the shift from a rapid‑prototype Dify platform to a hybrid engineering architecture, model selection strategies, workflow design, knowledge engineering, evaluation methods, and future directions for continuous improvement.

AI Customer ServiceEvaluation FrameworkPrompt engineering
0 likes · 21 min read
From Hackathon to Scalable AI Customer Service: Lessons and Best Practices
AI Tech Publishing
AI Tech Publishing
Jan 10, 2026 · Artificial Intelligence

Anthropic Engineers Reveal a Pragmatic Framework for Evaluating AI Agents

Anthropic engineers outline why rigorous AI Agent evaluation is essential, describe a comprehensive evaluation harness with tasks, trials, graders, and transcripts, compare capability and regression tests, discuss code-, model-, and human-based graders, and present an eight-step roadmap for building reliable Agent assessment pipelines.

AI AgentCapability EvaluationCode-based Grader
0 likes · 12 min read
Anthropic Engineers Reveal a Pragmatic Framework for Evaluating AI Agents
PaperAgent
PaperAgent
Jan 10, 2026 · Artificial Intelligence

How to Build Robust Evaluations for AI Agents: A Complete Roadmap

Anthropic’s new blog reveals a comprehensive framework for evaluating AI agents, detailing evaluation structures, metrics like pass@k and pass^k, types of scorers, multi‑round testing, and a step‑by‑step roadmap for designing, maintaining, and integrating automated assessments into agent development pipelines.

AI EvaluationAI agentsEvaluation Framework
0 likes · 15 min read
How to Build Robust Evaluations for AI Agents: A Complete Roadmap
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Sep 24, 2025 · Artificial Intelligence

Key Points for Evaluating AI Agents

The article explains how Coze's Compass introduces a flexible evaluation system for AI agents, outlines a four‑dimensional submodule assessment (planning, tool use, self‑reflection, memory), and details specific testing criteria and challenges for web, scientific, dialogue, and programming agents.

AI agentsBenchmarkingCoze
0 likes · 6 min read
Key Points for Evaluating AI Agents
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Aug 31, 2025 · Artificial Intelligence

Paper Review: AlphaEval – A Comprehensive, Efficient Framework for Evaluating Alpha Mining

AlphaEval is a unified, parallelizable evaluation framework that assesses Alpha mining models across predictive ability, time stability, market‑perturbation robustness, financial logic, and diversity without backtesting, matching full backtest results while offering higher efficiency and open‑source reproducibility.

Alpha MiningEvaluation FrameworkLLM
0 likes · 10 min read
Paper Review: AlphaEval – A Comprehensive, Efficient Framework for Evaluating Alpha Mining
DataFunSummit
DataFunSummit
Jun 13, 2024 · Product Management

Data‑Driven KOL Marketing Strategies for Game Growth in Western Markets

This article explains how Tencent IEGG leverages data‑driven KOL marketing across four key scenarios—budget planning, KOL evaluation, performance measurement, and competitor monitoring—to address cultural differences, optimize spend, and boost game adoption in the European‑American PC and console markets.

Budget PlanningCompetitive MonitoringEvaluation Framework
0 likes · 16 min read
Data‑Driven KOL Marketing Strategies for Game Growth in Western Markets
Airbnb Technology Team
Airbnb Technology Team
Nov 3, 2022 · Artificial Intelligence

T-LEAF: A Taxonomy Learning and Evaluation Framework for Airbnb Community Support Classification System

The T‑LEAF framework introduces quantitative metrics for coverage, usefulness, and consistency to iteratively develop Airbnb’s unified Contact‑Reason taxonomy, enabling faster feedback loops, reducing “Other” classifications, and improving both human annotation agreement and machine‑learning prediction accuracy in production.

Evaluation Frameworkclassificationcommunity support
0 likes · 14 min read
T-LEAF: A Taxonomy Learning and Evaluation Framework for Airbnb Community Support Classification System
DataFunTalk
DataFunTalk
Feb 13, 2022 · Big Data

How Kuaishou Built a Standardized Data Governance Evaluation System

This article outlines Kuaishou’s approach to establishing a standardized data governance evaluation framework, detailing the challenges of large‑scale data management, the design of assessment metrics across model, quality, and cost dimensions, and the practical strategies and operational mechanisms used to improve data asset health and business value.

Big DataEvaluation FrameworkKuaishou
0 likes · 21 min read
How Kuaishou Built a Standardized Data Governance Evaluation System
Efficient Ops
Efficient Ops
May 18, 2020 · Artificial Intelligence

How China’s AI Alliance Is Shaping RPA Evaluation Standards

The article outlines the AIIA‑hosted RPA technology salon, details the newly built RPA evaluation framework, explains RPA fundamentals and AI‑driven RPA 4.0 trends, and presents the alliance’s roadmap for standards and testing to boost successful automation deployments.

AI integrationEvaluation FrameworkRPA
0 likes · 6 min read
How China’s AI Alliance Is Shaping RPA Evaluation Standards