Tagged articles
4 articles
Page 1 of 1
FunTester
FunTester
May 22, 2026 · Artificial Intelligence

Why Prompt Tuning Isn’t Enough: Building a Test‑Driven Mindset for AI Products

The article argues that while prompt engineering accelerates early AI product development, it cannot guarantee overall quality, and advocates establishing a systematic evaluation pipeline—including curated datasets, clear benchmarks, regression testing, and automated checks—to make AI product quality visible and reliably improve over time.

AI testingPrompt EngineeringQuality Assurance
0 likes · 16 min read
Why Prompt Tuning Isn’t Enough: Building a Test‑Driven Mindset for AI Products
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
May 19, 2026 · Artificial Intelligence

How Cloud Agent Harness Grows Skills from Real Tasks: A Three‑Stage Self‑Evolution Mechanism

The article analyzes Huawei Cloud Agent Harness's three‑stage skill self‑evolution framework, detailing how agents automatically extract, evolve, and validate reusable skills from execution traces to overcome manual authoring bottlenecks and ensure continuous improvement.

AI agentsLLM‑driven optimizationevaluation pipeline
0 likes · 14 min read
How Cloud Agent Harness Grows Skills from Real Tasks: A Three‑Stage Self‑Evolution Mechanism
AI Tech Publishing
AI Tech Publishing
Apr 29, 2026 · Artificial Intelligence

Who Tests When AI Generates 99% of Code? Inside a Self‑Repairing Agent Harness

The article explains how a self‑repairing Agent Harness replaces traditional QA by looping evaluation, triage, automated fixing, verification and AI‑gated canary release, using a three‑judge reviewer, model‑based sampling and six daily engineering tasks to keep AI‑driven products reliable.

AI agentsAI-driven QAContinuous Deployment
0 likes · 16 min read
Who Tests When AI Generates 99% of Code? Inside a Self‑Repairing Agent Harness
Java One
Java One
Apr 13, 2026 · Artificial Intelligence

How to Build a Complete Prompt Evaluation Pipeline for Reliable AI Outputs

This guide walks you through constructing a full prompt‑evaluation workflow—from drafting prompts and generating a test dataset to running Claude, scoring responses with model‑ and code‑based metrics, and iterating until your prompts are data‑driven and trustworthy.

AI modelClaudePrompt Engineering
0 likes · 25 min read
How to Build a Complete Prompt Evaluation Pipeline for Reliable AI Outputs