From Zero to Production: Building High‑Availability, Reusable AI Skills
This guide walks you through the six common pitfalls of AI Skill development, presents best‑practice recommendations, outlines a five‑step production‑grade workflow, and showcases three complete real‑world Skill examples for operations, content publishing, and code review.
Common Pitfalls to Avoid
Many developers encounter six recurring errors when writing Skills: vague descriptions that never trigger, overly permissive allowed-tools leading to security incidents, rigid workflows that break with minor requirement changes, excessively long content that consumes half the context window, lack of fault‑tolerance, and undefined output formats.
1. Vague description
Incorrect: description: 数据分析 Correct:
description: 当用户询问MySQL查询、数据统计、员工薪资分析、部门报表、数据库优化时自动触发,所有数据库相关操作必须使用本SkillTip: List all trigger keywords and optionally add a mandatory‑use clause to raise trigger probability by about 90%.
2. Over‑permissive allowed-tools
Incorrect: allowed-tools: * (allows dangerous commands like rm -rf / or drop database)
Correct:
allowed-tools: Read, Bash(python:*, mysql:-e SELECT*), Write(*.md, *.sql)Tip: Apply the principle of least privilege; restrict to specific commands or file extensions whenever possible.
3. Rigid workflow
Incorrect: "Must read a.md, then b.md, then generate report to c.md"
Correct: "Read a.md and b.md for background; if missing, search automatically; generate report to c.md but allow user‑specified path"
Tip: Define core rules only; let the AI adjust steps as needed.
4. Overly long content
Incorrect: Embedding hundreds of SQL rules directly in SKILL.md, consuming half the context window.
Correct: Reference detailed rules in a separate reference.md and load them only when required.
Tip: Keep core rules in the main file; externalize detailed examples to save up to 80% of context.
5. No fault tolerance
Incorrect: Assuming all commands succeed.
Correct: On SQL failure, first verify syntax, then connection, retry up to two times, and finally report the error to the user.
Tip: Add explicit error‑handling rules for every potentially failing step.
6. Undefined output format
Incorrect: No specification, leading to inconsistent results.
Correct: Require three parts – SQL statement, result table, and analysis conclusion – and forbid extra explanatory text.
Tip: Provide a concrete example to ensure deterministic output.
Five‑Step Production‑Grade Skill Development Process
Step 1 – Requirement Analysis
Identify trigger scenarios, keywords, and forbidden contexts.
Define capability boundaries (what the Skill can and cannot do).
Specify exact output requirements and format.
Deliverable: a concise (<100‑word) requirement statement.
Step 2 – Metadata Design
Key fields must be accurate:
name : all lowercase, hyphenated (e.g., mysql-nl2sql).
description : "trigger scenario + mandatory‑use" format.
allowed-tools : minimal‑permission list.
model : choose large model for complex tasks, small model for simple ones.
context: fork : run risky operations in isolated context.
disable-model-invocation: true : require manual activation to avoid accidental triggers.
Step 3 – Content Authoring
Structure the Skill body in this order:
Argument reception : first line captures user input ($ARGUMENTS).
Background information : provide necessary context such as DB address or API docs.
Core workflow : step‑by‑step actions, only core rules.
Prohibited rules : explicitly forbid actions like generating DELETE or UPDATE statements.
Output specification : define exact format and give examples.
Reference material : link to auxiliary files.
Step 4 – Testing & Validation
Perform three categories of tests, each with at least five scenarios:
Trigger test : verify correct activation and absence of false positives.
Workflow test : ensure the Skill follows the defined process under varied inputs.
Security test : attempt dangerous commands and confirm they are blocked by permissions.
Step 5 – Deployment & Iteration
After launch, continuously record:
Trigger frequency (calls per day).
Success rate (successful vs. failed executions).
User feedback (mismatches, missing features).
Iteratively refine description, workflow, and permissions based on the data.
Three Production‑Grade Skill Cases
Case 1 – Operations: Online Troubleshooting Skill
---
name: online-troubleshooting
description: 当用户反馈线上服务故障、报错、卡顿、5xx错误时自动触发,所有线上故障排查必须使用本Skill
allowed-tools: Bash(curl:*, kubectl:*), Read
model: sonnet
context: fork
---
# 线上故障排查Skill
## 故障描述:$ARGUMENTS
## 排查流程
1. 先检查服务状态:
```bash
kubectl get pods | grep $service-name
```
2. 查看最近500行日志:
```bash
tail -n 500 /var/log/service.log | grep ERROR
```
3. 检查CPU/内存使用率:
```bash
top -b -n 1 | grep java
```
4. 检查数据库连接:
```bash
curl http://localhost:8080/health
```
5. 按格式输出排查报告
## 禁止操作
- 禁止执行任何修改配置、重启服务、删除文件的命令
- 禁止泄露任何敏感信息(密码、密钥、用户数据)
## 输出格式
```
故障原因:[明确的原因]
临时解决方案:[可立即执行的命令]
根本解决建议:[长期修复方案]
```
## 参考文档
常见故障排查手册见 `troubleshooting-manual.md`Result: manual troubleshooting reduced from 30 minutes to 5 minutes; new team members handle 80 % of common issues.
Case 2 – Content: WeChat Publishing Skill
---
name: wechat-publish
description: 当用户要求发布公众号文章、推送草稿、生成封面、排版文章时自动触发,所有公众号相关操作必须使用本Skill
allowed-tools: Read, Write, Bash(node:*, pandoc:*)
model: claude-4-opus
---
# 公众号发布Skill
## 文章路径/内容:$ARGUMENTS
## 执行流程
1. 读取文章内容,检查是否有敏感词、违规内容
2. 自动排版:短段落、重点加粗、代码块指定语言
3. 生成适配公众号的HTML:
```bash
pandoc -s $md-path -o $html-path
```
4. 生成封面图:调用生图脚本生成16:9无版权封面
5. 推送到公众号草稿箱:
```bash
node scripts/publish_with_cover.js --article $md-path --html $html-path
```
6. 返回草稿ID和预览链接
## 排版规范
- 每段不超过3行,多留白
- 核心观点加粗,不用斜体
- 代码块添加左侧蓝色边框
- 所有图片自动上传到微信CDN
## 禁止规则
- 禁止发布包含敏感词、政治内容、违规素材的文章
- 禁止使用有版权的图片、素材Result: publishing time cut from 30 minutes to 2 minutes with consistent formatting.
Case 3 – Development: Code Review Skill
---
name: code-review
description: 当用户要求审查代码、PR、MR时自动触发,所有代码审查必须使用本Skill
allowed-tools: Read, Bash(gh:*, grep:*), Write
model: sonnet
---
# 代码审查Skill
## PR编号:$ARGUMENTS
## 执行流程
1. 拉取PR代码:
```bash
gh pr view $ARGUMENTS --json title,body,files
```
2. 检查代码规范:是否符合团队编码规范,有没有语法错误
3. 检查安全漏洞:是否存在SQL注入、XSS、敏感信息泄露等
4. 检查性能问题:是否有慢查询、内存泄漏、无效循环等
5. 输出审查报告,给出修改建议
## 审查标准
- 代码必须有注释,关键逻辑必须说明
- 必须有单元测试,覆盖率不低于80 %
- 不能有硬编码的密码、密钥
- 接口必须有参数校验
## 输出格式
```
总体评分:[1-10分]
存在问题:
- [严重] xxx问题,位置:xxx,建议:xxx
- [一般] xxx问题,位置:xxx,建议:xxx
- [建议] xxx优化点,位置:xxx,建议:xxx
```Result: manual PR review reduced from 1 hour to 10 minutes, catching 80 % of low‑level issues.
Advanced Skill Practices
Skill composition : chain multiple Skills (e.g., content‑write → wechat‑publish) for end‑to‑end automation.
Team Skill sharing : store best‑practice Skills in .claude/skills/, commit to Git, and onboard new members three times faster.
Version management : tag Skills with versions (e.g., mysql-nl2sql@v1) to enable quick rollback.
Data instrumentation : log invocation count, success rate, latency, and error reasons for continuous improvement.
Commercialization : package vertical‑domain Skills as sellable products (e‑commerce, education, enterprise ops) with pricing ranging from thousands to tens of thousands of yuan.
In essence, a Skill codifies human expertise into a reusable, AI‑executable asset; writing it requires only Markdown proficiency, turning years of experience into a scalable product.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Ambition
Observations, practice, and musings of an architect. Here we discuss technical implementations and career development; dissect complex systems and build cognitive frameworks. Ambitious yet grounded. Changing the world with code, connecting like‑minded readers with words.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
