R&D Management 16 min read

Boost Code Quality with a Devil‑Style Review Loop and Spec‑Kit

This article walks through a six‑step, AI‑augmented review‑driven development workflow—using Spec‑Kit commands, devil‑style iterative reviews, and TDD—to turn a simple "user login failure lock" requirement into a robust, well‑tested implementation while catching design flaws early.

Qborfy AI
Qborfy AI
Qborfy AI
Boost Code Quality with a Devil‑Style Review Loop and Spec‑Kit

Overall Workflow

The process consists of six stages, each driven by a Spec‑Kit command and a repeatable AI‑assisted review loop (the “devil’s review”). The concrete case is a "user login failure lock" feature.

1. Create Specification – /speckit.specify. Define what and why of the feature.

2. Clarify Requirements (optional) – /speckit.clarify. Compare the new spec with existing code to avoid conflicts.

3. Draft Technical Plan – /speckit.plan. Align the plan with the spec and industry best practices.

4. Break Down Tasks – /speckit.tasks. Produce a TDD‑ready task list.

5. Implement – /speckit.implement. Follow TDD; stop whenever a doubt arises.

6. Code Review – code‑review sub‑agent. Detect code smells using Martin Fowler’s refactoring catalogue.

Step‑by‑Step Details

1. Create Specification

Command example:

/speckit.specify 我们需要为 Web 应用增加一个安全功能:当用户连续登录失败达到 5 次后,账户将被临时锁定 15 分钟。锁定期间,即使密码正确也不允许登录。解锁后失败次数清零。管理员可以手动解锁账户。

The AI generates an initial spec.md and then asks the reviewer to verify completeness:

基于你生成的 spec.md,结合原始的需求内容,是否包含了全部内容?请检查并提问。

Typical AI‑generated questions (first round):

What exactly counts as “continuous failure”? Does a successful login reset the counter?

Should a non‑existent username be counted as a failure?

When the account is unlocked, should the failure count be cleared automatically or only after a manual unlock?

Does the admin unlock operation require permission checks and audit logging?

These questions expose ambiguities that the original requirement omitted. The reviewer answers with concrete recommendations, e.g. “reset the counter on any successful login”, and the AI updates spec.md accordingly.

2. Clarify Requirements

If the project already exists, the AI scans the codebase and reports findings that affect the design, such as an existing IP‑based failure counter or a ready‑made admin‑log module. The reviewer decides whether to reuse or replace those components, preventing later integration bugs.

3. Draft Technical Plan

Command example:

/plan 在生成 plan 计划之前,请先阅读 spec.md 文件,必须要求生成 plan 符合 spec.md 的要求,必须参考业界主流方案。

The AI produces plan.md containing stack choices, module decomposition, and degradation strategies. A review prompt then checks alignment with the spec and industry practice:

请检查一下生成的内容,是否与 spec.md 文件描述符合,功能是否可以完整实现,同时符合业界主流方案。

Typical AI‑raised concerns:

Has Redis high‑availability been considered? What is the fallback if Redis is down?

Will Redis calls increase login latency?

Does the admin unlock endpoint need a dedicated permission middleware?

The reviewer selects the best recommendation (e.g., enable Redis sentinel, add a latency‑budget check, reuse the existing adminAuth middleware) and the AI revises plan.md accordingly.

4. Break Down Tasks

Command example:

/task 在生成任务前,请先阅读 spec.md 和 plan.md 文件的要求,结合现有项目代码,必须生成符合要求且能实现完整功能的任务内容。

The AI outputs tasks.md with granular tasks and associated test cases. A review prompt validates completeness:

请检查一下生成的 tasks.md,是否满足 spec.md 和 plan.md 文件的实现要求,结合现有项目代码,能否按照该任务描述完整实现功能。

Typical AI‑identified gaps:

Missing test for “login attempts during lock period”.

No unit test covering Redis timeout and degradation.

Admin unlock lacks permission‑related test cases.

The reviewer asks the AI to add the missing scenarios, and the AI updates tasks.md until no new issues are reported.

5. Implement (TDD)

Implementation command:

/speckit.implement 必须严格按照宪章和 tasks.md、spec.md、plan.md 中的描述来,如果在生成的代码遇到任何疑问都需要停下来。必须遵循以下步骤:
1. 测试驱动开发规范
2. 生成后的代码必须符合当前 lint 规则

During code generation the AI may pause for clarification, e.g.:

“在实现登录接口时,关于 Redis 降级策略:如果 Redis 查询锁定状态超时,按照 plan.md 应该允许继续登录。我准备用 try‑catch 捕获异常并记录日志,假设未锁定。这样实现是否合理?”

The reviewer either answers directly or asks the AI to propose alternatives. Once the best approach is agreed, the AI writes the code, generates the corresponding unit test, runs the test suite, and proceeds to the next task only after the test passes.

6. Code Review

The code‑review sub‑agent runs a checklist derived from Martin Fowler’s “code smells” (over‑long functions, duplicated code, primitive obsession, etc.). Example output:

整体来看,登录函数实现了需求,但有几个地方可以改: 1. 过长函数 ( login 约 60 行)——建议提取 isUserLocked 、 recordFailedAttempt 、 clearFailedAttempts 。 2. 重复代码 ——建议抽象为 withRedisFallback(operation, fallbackValue) 。 3. 基本类型偏执 ——建议使用 REDIS_KEYS 常量对象统一管理键名。

After each round the AI updates the affected files ( spec.md, plan.md, tasks.md) and repeats the loop until no new issues are reported.

Key Takeaways

Early AI‑driven questioning surfaces ambiguities before any code is written.

Aligning spec, plan, and tasks with industry best practices reduces hidden risks.

Embedding TDD and continuous code‑review creates a feedback loop that catches defects at the source.

The “devil’s review” (AI asks, suggests, fixes, repeats) shifts quality left and dramatically cuts rework.

Reference: Martin Fowler – Refactoring: Improving the Design of Existing Code (2nd Edition) .

software engineeringcode qualityTDDSpec-KitSDDreview-driven development
Qborfy AI
Written by

Qborfy AI

A knowledge base that logs daily experiences and learning journeys, sharing them with you to grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.