AI Tech Publishing
AI Tech Publishing
Mar 7, 2026 · Artificial Intelligence

A Practical Guide to Evaluating Agent Skills

This article explains why many Agent Skills are released without testing, defines measurable success criteria, and presents a lightweight evaluation framework—including prompt set creation, deterministic checks, optional LLM‑based qualitative checks, and best‑practice recommendations—demonstrated by improving a Gemini Interactions API skill from 66.7% to 100% pass rate.

AI agentsAgent SkillsGemini
0 likes · 13 min read
A Practical Guide to Evaluating Agent Skills
AI Large Model Application Practice
AI Large Model Application Practice
Sep 14, 2023 · Artificial Intelligence

How LangSmith Turns LLM Debugging into Production‑Ready Insight

This article explores how LangSmith, an experimental platform from the LangChain team, bridges the gap between prototype LLM applications and production by providing comprehensive tracing, debugging, testing, evaluation, and run‑management features that help developers monitor and improve generative AI systems.

AI ObservabilityLLM debuggingLLM evaluation
0 likes · 11 min read
How LangSmith Turns LLM Debugging into Production‑Ready Insight