How Taobao’s Homepage Migrated to Serverless: Architecture Upgrade and Quality Assurance
This article details Taobao’s serverless architecture upgrade for its homepage, covering the background of the Cloud Native 2.0 initiative, the three‑pronged system transformation, risk analysis, and a comprehensive quality‑assurance plan that includes pre‑release verification, traffic switching, and stress testing.
Background
Alibaba Group’s Taobao is advancing its Cloud Native 2.0 campaign by upgrading to a serverless architecture. The homepage is the first pilot, and its smooth migration determines whether other services can follow.
System Transformation Plan
The upgrade involves three major areas:
Business code refactor : Extract a business‑base layer in the new environment, move third‑party packages into this layer, and modify the Maven profile so a single codebase can run on both the old (non‑serverless) and new (serverless) environments. Dependency isolation is achieved by deploying different third‑party packages per environment and adjusting bean initialization.
Release process refactor : Divide the migration into three phases—pre‑switch, during‑switch, and post‑switch. Before the switch, separate pipelines deploy to the old and new environments independently; during the switch, a single pipeline publishes to both; after the switch, only the new‑environment pipeline remains.
Traffic‑switching method : Use an internal traffic‑switching system where the old and new environments belong to different application groups bound to distinct cluster keys. The gateway routes traffic based on these keys, enabling gradual gray release.
Risk Analysis
The upgrade introduces high uncertainty and an unknown impact scope, requiring full regression testing across all business scenarios. The homepage’s long‑standing, complex components make exhaustive coverage difficult.
Quality‑Assurance Strategy
To mitigate risks, a multi‑layer interception approach is adopted:
Offline: Ensure comprehensive scenario coverage before release.
Online: Observe detailed metrics after launch and roll back quickly via traffic switching if issues arise.
Pre‑Release Verification
Core functionality review and validation of all business scenarios involving third‑party packages.
Record‑and‑replay testing to capture missed scenarios.
Because the standard recording‑replay platform cannot compare two IPs simultaneously and cannot handle Taobao’s complex comparison rules, a custom assertion framework was built on top of the existing tool to compare results from the old environment (baseline) with the new environment.
Release Phase
Whitelist verification: Validate core scenarios with a whitelist.
Record‑and‑replay: Parallel verification in the serverless environment after traffic is directed online.
Single‑machine stress test: Conduct dual‑environment isolation testing, selecting the homepage’s main interface and post‑purchase flow as representative endpoints. Steps include running synchronized tests in both isolated environments and comparing system metrics.
Gray Release During Promotion
During major sales events, pressure testing must continue without being blocked by the new environment. A dual‑environment stress‑test isolation model is used: the old environment runs at 100% traffic, while the new environment is tested proportionally. If issues arise in the new environment, testing stops there while the old environment proceeds.
Summary
After incremental traffic switches of 10% during the 618 promotion and 50‑80% between the 618 and 88 promotions, a full 100% switch was completed on August 2. The recording‑and‑replay comparison solution proved effective and will be reused for routine and safety‑production regression to improve overall test coverage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
