Running Code Review and Voice Agents with Step Plan and Claude Code
The article walks through using Step Plan’s unified API to integrate Claude Code for automated code review and to build a voice‑agent pipeline that transcribes meeting recordings, generates structured summaries, and produces audio briefs, while discussing setup, costs, model selection, practical demos, and observed limitations.
Why Use Step Plan
Step Plan provides a single API that includes coding, ASR, TTS, and routing capabilities, enabling workflows that need to handle code, voice, and structured summaries together.
Pricing is a monthly subscription with a dual quota (e.g., Flash Mini plan: ¥49/month, ~1500 prompts per week and 100 prompts per 5‑hour window). Using the regular API base URL instead of the dedicated Step Plan endpoint routes requests to a different billing system.
Environment Setup: Connecting Step Plan to Claude Code
API Key
Subscribe on the Step Plan website and create an API key in the console.
Base URL Configuration
Use the dedicated Step Plan endpoint: https://api.stepfun.com/step_plan/v1 Do not use the regular endpoint https://api.stepfun.com/v1, otherwise the request will be billed under the wrong plan.
Claude Code Integration
Claude Code can switch models via environment variables. Two methods are provided:
Configuration‑file method (recommended): install Claude Code with npm install -g @anthropic-ai/claude-code, then edit ~/.claude/settings.json to set the environment variables, e.g.:
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your_step_plan_api_key",
"ANTHROPIC_BASE_URL": "https://api.stepfun.com/step_plan",
"ANTHROPIC_MODEL": "step-3.5-flash-2603",
"ANTHROPIC_SMALL_FAST_MODEL": "step-3.5-flash",
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}Replace your_step_plan_api_key with the actual key. To test the router, set ANTHROPIC_MODEL to step-router-v1.
CC Switch (visual switch): use the CC Switch UI to select Step Plan as the provider and fill in the same base URL and model name.
Verification
Run claude or use the Claude Code UI and check the model name; it should show step-3.5-flash-2603 or step-router-v1 to confirm the integration succeeded.
Demo 1: Code Review Agent
The author uses a multi‑agent stock‑trading project ( agent-invest, Java 21 + Spring Boot 4) and focuses on the alert module, which spans backend controllers, services, and frontend enums.
Two constraints are applied:
First round: read‑only review, no code modifications.
Second round: fix at most one issue to avoid large‑scale refactoring.
The agent reads six core files, then searches for related validation classes, test directories, and back‑testing services. It lists seven price‑alert conditions (e.g., ABOVE, BELOW, CROSS_UP, VOLUME_SURGE) and fourteen technical‑indicator conditions (e.g., RSI_OVERBOUGHT, MACD_GOLDEN_CROSS, BOLL_BREAK_LOWER), plus 21 backend whitelist entries.
The agent flags an architectural risk: the frontend enum list and backend whitelist are maintained separately, which can cause mismatches when new conditions are added.
It then produces a structured report covering null‑value protection, cooldown handling, duplicate triggers, parameter validation, trading‑session checks, logging, and test coverage. A false positive example is the “duplicate trigger” flag, which is already mitigated by existing cooldown logic.
Human review confirms the report contains valuable insights but also a few misjudgments, emphasizing that AI‑generated reviews must be verified.
The author manually fixes the time‑zone issue in the isMarketOpen method by explicitly using the Asia/Shanghai zone, adds eleven unit tests covering open, closed, lunch, weekend, and edge cases, and confirms the fix works in about a minute.
Demo 2: Meeting Transcription and Podcast Summary Agent
The second demo builds a closed‑loop pipeline: generate a synthetic meeting audio with TTS, transcribe it with ASR, let a language model summarize the transcript, and finally synthesize a 60‑second audio brief.
Models used: stepaudio-2.5-tts – generate synthetic audio. stepaudio-2.5-asr – speech‑to‑text. step-3.5-flash-2603 – summarize transcript.
A minimal Node.js demo (Node 18+) is provided. After setting STEP_API_KEY, the script creates a directory, writes demo.mjs, and runs:
export STEP_API_KEY=your_step_plan_api_key
node demo.mjsThe script builds a meeting script array, calls the TTS endpoint to produce meeting-review.mp3, sends the audio to the ASR endpoint to obtain transcript.md, then prompts the chat model to produce a structured summary ( summary.md) containing confirmed facts, root‑cause directions, action items, pending questions, and a 60‑second voice script, which is finally turned into incident-summary.mp3.
Key implementation details:
Hotwords ( OrderQueryService, Redis, P95) improve ASR accuracy.
The TTS instruction field can embed control commands in parentheses to adjust pauses, speed, or tone.
After running, the output folder contains five files: the synthetic audio, the ASR result, the structured summary, the final voice brief, and the raw ASR event log ( asr-events.json), which is useful for debugging.
Manual verification is still required for:
Confirming that key conclusions actually appeared in the meeting.
Validating owners and deadlines.
Ensuring root‑cause statements have monitoring or log evidence.
Checking for sensitive or customer data.
Assessing whether the audio brief can be shared externally.
Step Router V1: Automatic Task Routing
Why Routing?
Simple Q&A benefits from speed, while multi‑file code reviews need deeper context. step-router-v1 automatically selects between deepseek-v4-pro (strong) and step-3.5-flash (fast) based on perceived task complexity.
How It Works
Configure the router by setting ANTHROPIC_MODEL to step-router-v1 and providing the fast model as ANTHROPIC_SMALL_FAST_MODEL. The router decides internally which backend to invoke.
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your_step_plan_api_key",
"ANTHROPIC_BASE_URL": "https://api.stepfun.com/step_plan",
"ANTHROPIC_MODEL": "step-router-v1",
"ANTHROPIC_SMALL_FAST_MODEL": "step-3.5-flash",
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}Practical Experience
The author tested three scenarios:
Simple Q&A about StringBuilder vs StringBuffer – fast response, acceptable answer.
Multi‑file code review – output quality comparable to using the strong model directly.
Complex refactoring request for AlertMonitorService – longer processing, detailed suggestions, but the internal routing decision remains opaque.
The router balances speed and capability, but its black‑box nature means the exact model choice cannot be explicitly controlled.
Conclusion
Step Plan can be integrated into daily workflows. For code review, the agent narrows the review scope to high‑risk files, saving manual effort while still requiring human judgment for final decisions. For voice workflows, the end‑to‑end pipeline converts meeting audio into structured text and a concise audio brief, automating a previously time‑consuming task.
Limitations include the need for human validation of AI‑generated reports, the inability of the router to expose its internal model choice, and the fact that the agent cannot determine root causes on its own.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaGuide
Backend tech guide and AI engineering practice covering fundamentals, databases, distributed systems, high concurrency, system design, plus AI agents and large-model engineering.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
