Artificial Intelligence 15 min read

Building a Dual‑Engine AI Assistant for DingTalk with Qoder CLI and Claude Code

The article details a complete engineering solution that lets a DingTalk group chat invoke an AI assistant—switchable between Qoder CLI and Claude Code—to query logs, run experiments, analyse performance and even deploy code, while handling intranet constraints, latency, security isolation, and Docker deployment.

Alibaba Cloud Developer

Jun 2, 2026

Building a Dual‑Engine AI Assistant for DingTalk with Qoder CLI and Claude Code

Background and Challenges

In the Flash Sale search team, daily tasks such as log inspection, performance analysis and experiment management are scattered across SLS logs, the TPP platform and code repositories, causing low efficiency. The goal is to converse with an AI assistant inside a DingTalk group that can query logs, view experiments, analyse performance and even deploy code. Core challenges include internal network deployment restrictions, long AI inference latency (30‑120 s), security requiring permission isolation, and the need to integrate multiple external tools.

Solution Overview

The final architecture adopts “DingTalk Stream + CLI proxy”. DingTalk Stream provides a WebSocket long‑connection that works inside the intranet without a public callback. A Java service (alsc‑intervene) handles permission checks, context management (LRU + TTL), thread‑pool concurrency (10‑15 threads) and streams AI card updates. The CLI proxy layer spawns either Qoder CLI or Claude Code, applies line‑buffering (stdbuf ‑oL), enforces a 120 s timeout and kills the process on errors. Both engines output in stream‑json, allowing the Java side to parse and update AI cards uniformly.

Engine Selection

Initially Qoder CLI was chosen because it is an internal product with ready‑made Skills and MCP ecosystem, and its CLI mode fits server‑side spawning. However, in complex multi‑step troubleshooting Qoder CLI sometimes gave inaccurate answers and lagged behind the desktop IDE. Therefore Claude Code was introduced as a replacement for difficult scenarios, delivering significantly deeper reasoning and higher accuracy.

Docker Deployment

Both engines run in a single Docker container sharing a workspace and MCP configuration. The Dockerfile installs JDK 11, Node.js and the two CLI tools, copies the workspace and credential files, and sets strict file permissions (600 for token files, 444 for client config). The container starts the Java service which connects to DingTalk Stream.

MCP Tool Integration and OAuth Bypass

MCP (Model Context Protocol) standardises AI calls to external APIs. Normally MCP uses an OAuth2 flow that requires a browser, which fails in headless containers. The workaround is to obtain a long‑lived Bearer token offline and inject it statically into .mcp.json headers, skipping the interactive OAuth handshake. Token files are stored with restrictive permissions and refreshed manually when expired.

DingTalk Stream Integration

Stream mode replaces the traditional HTTP callback, allowing the service to initiate a WebSocket connection to DingTalk without exposing any ports. The Java client is built with OpenDingTalkStreamClientBuilder, registers a message listener and starts the connection. Chat handling uses ProcessBuilder to launch the selected CLI, forces line buffering, reads output with a 256‑byte BufferedReader, and aborts the process if it exceeds 120 s.

Key Design Points

stdbuf -oL

forces line buffering to avoid the 4 KB Node.js block.

256‑byte BufferedReader ensures timely streaming.

Process‑level isolation lets each request run in its own process.

Exception handling destroys the process immediately to stop consuming inference quota.

User Context Management

A three‑layer protection scheme stores per‑user context in a LinkedHashMap with LRU eviction: TTL 48 h clears inactive users, a sliding window limits each user to 200 KB, and a global LRU caps total users at 500.

Permission Isolation

Administrators have full read/write and deployment rights, while regular users operate in read‑only mode enforced by the command‑priority layer.

Concurrency Control

A thread‑pool executor (core 10, max 15, queue 30) manages concurrent CLI invocations.

Knowledge Self‑Evolution

The system records five levels of knowledge—from git history (L0) to formal rules (L4). When a candidate rule is triggered ≥3 times with ≥80 % success, it is promoted automatically.

Deployment Checklist

Create a DingTalk internal robot and enable Stream mode.

Configure appKey, appSecret and robot code via environment injection.

Set dingtalk.stream.enabled=true for testing, false in production.

Define AI card template ID in application.properties.

Runtime Experience

Performance traces show real‑time streaming updates; screenshots illustrate successful log queries, experiment checks and code deployments. The author notes that stdbuf is essential, MCP token format must be a JSON array, multiple Stream instances cause duplicate handling, and AI‑card streaming permission is required.

Conclusion

The DingTalk Stream + CLI proxy architecture delivers an intranet‑only, real‑time AI assistant with secure permission isolation, static Bearer‑token MCP access, interchangeable engines, and production‑grade stability through thread‑pooling, timeouts and multi‑layer LRU protection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Docker MCP stream ai-assistant DingTalk Claude Code Qoder CLI

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.