25 min read

Can AI Safely Write Code for High‑Risk Backend Systems? Lessons from Tencent’s CDN LEGO Project

When AI coding hype focuses on front‑end page generation, the real challenge is whether AI can be trusted to write code for a million‑line, high‑availability CDN backend; this article details Tencent’s systematic exploration, a 20‑day Rust proxy prototype, a five‑layer Harness Engineering framework, and concrete data showing both breakthroughs and remaining risks.

Tencent Technical Engineering

Apr 21, 2026

Can AI Safely Write Code for High‑Risk Backend Systems? Lessons from Tencent’s CDN LEGO Project

Introduction

AI coding has been demonstrated on front‑end demos, but its safety for mission‑critical back‑ends remains unproven. Tencent’s CDN core framework LEGO processes millions of requests per second, supports HTTP/1.1, HTTP/2, HTTP/3, WebSocket, TLS, and integrates heavily modified third‑party libraries (OpenSSL, QUIC, Lua, JavaScript, etc.). The combinatorial configuration space exceeds 13,824 × N, meaning a single erroneous line could cause a network‑wide outage.

Nonstop – A Zero‑Human Rust Proxy

To probe AI’s limits, the team built nonstop , a Rust‑based L4/L7 proxy in 20 days using only AI‑generated code. Features include:

Full L4/L7 proxying, HTTP/3 + QUIC, built‑in WAF, V8 JavaScript workers, single‑binary deployment, hot‑load without downtime.

Benchmark: 42,052 QPS at 5,000 concurrent connections, 0 errors, P50 latency 1.1 ms, six‑layer deep security.

The prototype proved AI can produce functional code, but raised questions about applying AI to a massive C++ codebase like LEGO.

Core Problems in Large‑Scale Backend AI Coding

Analyzing 57 real incidents across multiple projects revealed 13 problem categories and five root causes. The most critical issues are:

Async semantic misuse (blocking send in Tokio) – Critical – source: nonstop.

Hallucination (non‑existent API calls, fabricated RFC sections) – High – source: multiple projects.

Partial updates without cleanup – High – source: nonstop.

Configuration‑implementation mismatch – High – source: nonstop.

Security blind spots (timing attacks, SSRF, JWT) – Critical – source: nonstop.

Root causes: AI lacks “uncertainty awareness” and a global view, leading to confident but wrong answers, hallucinations, incomplete changes, and missing environment context.

Harness Engineering – A Structured AI‑Assisted Development Framework

The team introduced a five‑layer architecture that enforces context, constraints, and feedback at every step.

Layer 1 – Permission & Security Base : sandboxed execution, network isolation.

Layer 2 – Code Rules as a Compiler : declarative constraints (e.g., “no raw new, use unique_ptr ”).

Layer 3 – Process Constraints : mandatory test‑then‑review workflow (function → unit test → code review).

Layer 4 – Context Construction : project constitution ( Agent.md), security discipline (anti‑example rules), domain‑knowledge libraries, expert skills, up‑to‑date RFC corpora (38 k lines stored locally).

Layer 5 – Feedback Loops : automatic hook collection, pitfall journal, inline CLAUDE.md annotations, continuous A/B validation.

All AI output passes through these layers, turning vague expectations into enforceable constraints. For example, the rule “prohibit raw new, must use unique_ptr ” is obeyed 100 % by the model, whereas a vague expectation “write high‑quality code” yields unstable results.

Three‑Way Adversarial Code Review (CR)

To mitigate single‑model blind spots, LEGO runs three independent reviewers (Claude, Codex, Gemini). Each model produces its own issue set; a manager aggregates them into cr_report.md. Overlap indicates high confidence, while unique findings trigger deeper investigation. This approach discovers deeper defects than static analysis and reduces false positives.

Concrete Constraints Derived from Real Pitfalls

From the 57 incidents, five machine‑readable constraints were codified and enforced:

Research one competitor at a time – prevents cross‑contamination of C and C++ patterns.

Disable network access for research agents – avoids stale or inaccurate web results.

Skip analysis when source code is unavailable – prevents fabricated source analysis.

Never modify lego_server directly – maintains responsibility isolation.

Restrict search scope to project directories – prevents pollution from system files.

These constraints are expressed as rules that the AI cannot violate.

Quantitative Benefits

Applying Harness Engineering to LEGO yielded measurable efficiency gains:

Competitor research reduced from 3 person‑days to 1 day (~3× speedup).

Design time cut from 2–3 person‑days to 1 day (~2×).

Protocol security testing compressed from 3–5 person‑days to 1 day (~4×).

Code‑review waiting time dropped from 1–3 days to 30 minutes.

Overall workflow efficiency improved by ~20 % after accounting for learning curve.

Quality metrics after adoption:

cpplint compliance > 95 %.

CVE coverage 100 %.

Zero production crashes during the evaluation period.

Case Study – Fixing a Read‑Write Race in cpuinfos

Adversarial CR quickly identified the race condition.

AI generated three candidate solutions: ReadWriteLock, atomic<shared_ptr>, and a zero‑overhead double‑buffer with an atomic index.

Automated tests validated each; the double‑buffer was selected for zero performance overhead.

Development time shrank from 5 days to 1 day; a secondary issue (thread initialization) surfaced for later refinement.

The fix delivered a 2.0× speedup in resolution and eliminated the race without performance penalty.

Differentiation from Existing Tools

Compared with GitHub Copilot (single model, serial review) and OpenAI Codex (serial two‑model review), LEGO’s approach uses three parallel models, cross‑iteration, debate‑style validation, and automatic convergence, achieving deeper defect discovery and higher confidence.

Role Evolution in the AI Coding Era

Junior developers become AI operators, mastering prompts and skill libraries.

Senior developers become Harness Engineers, designing constraints and context.

Architects focus on human‑AI collaboration architecture (deciding what to automate vs. manual).

Test & security engineers evolve into AI Quality Engineers and AI Security Experts, building test loops and safety skills.

The core capability across all roles is abstract thinking: knowing what to delegate to AI, how to verify it, and how to encode domain knowledge into reusable skills.

Team‑Building Roadmap

Months 1‑2 (Learn to Use) : adopt the full workflow, adversarial review, and 14 security rules.

Months 2‑4 (Learn to Build) : core members author team‑specific skills, run A/B experiments, share knowledge.

Months 4‑12 (Learn to Evolve) : scale Harness automation, enable cross‑team knowledge sharing, close the feedback loop for continuous improvement.

The attitude balance is to be cautious (review every AI line), aggressive (apply AI to high‑frequency tasks), and evangelistic (make AI‑assisted development a cultural norm).

Conclusion

AI coding is not about replacing engineers; it is about redefining the engineering process. LEGO’s Harness Engineering turns every pitfall into a rule, every rule into a skill, and every skill into a collective knowledge asset that accelerates future work. The real value lies in the sustainable, self‑evolving system rather than isolated speed gains.