How We Built a Stable Regression Environment for Over 1,000 Services
This article details the challenges, planning, and step‑by‑step execution of creating an isolated regression environment for a large micro‑service platform, covering quantifiable goals, solution evaluation, infrastructure refactoring, data migration, deployment results, and future improvements.
Background
Managing offline environments for the Yanxuan platform is a long‑term, complex task. Building a regression environment became an urgent, actionable goal. After months of trial and error, the team defined quantitative targets, drafted implementation plans, and progressed through clustering, infrastructure, business, and data construction steps.
Why Offline Environment Governance?
Key pain points include mixed integration, testing, and regression environments causing low efficiency; unstable environments leading to service outages; and non‑standardized historical operations that prevent rapid environment replication.
To address these, Yanxuan planned a dedicated regression environment in 2020, ensuring core services have a stable platform for pre‑release verification without interference from other environments.
Why Move the Offline Environment to the Cloud?
Two deployment models exist: VM‑based (cloud‑outside) and container‑based (cloud‑inside). Cloud‑inside offers lighter weight, easier scaling, and faster environment cloning and destruction.
Although many services have already migrated to the cloud, only a fraction of the total services are cloud‑native. The goal is to prioritize cloud‑inside deployment for the regression environment, extending existing cloud services and encouraging non‑cloud services to migrate.
How We Conducted Regression Environment Construction
Quantifiable Goal
The most blockage‑prone services are the B‑side supply‑chain core services (≈75 services) with long dependency chains covering procurement, planning, inventory, quality control, distribution, packaging, and supply‑chain management.
We identified roughly 180 core‑link services across the platform and set the goal to build an isolated regression environment for these B‑side services.
Feasible Solution
Two options were evaluated:
Option 1: Build an independent, data‑isolated environment.
Option 2: Use traffic‑shading to create a dynamic, extensible cluster containing only frequently changed services.
Option 1 guarantees data isolation and stability but incurs higher resource costs. Option 2 saves resources but depends on accurate service identification and mature traffic‑shading technology, which was not yet fully reliable.
After analysis, we chose Option 1.
Implementation Steps
1. Regression Cluster Pre‑setup
The operations team prepared logical deployment units (LDCs) for both cloud‑inside and cloud‑outside services, handling network whitelists and data isolation.
2. Infrastructure Refactoring
Infrastructure services were split into two categories: those tightly coupled with business data (e.g., BPM, workflow, permission center) requiring independent deployment, and shared services (e.g., release system, gateway, configuration center, messaging, scheduling, APM, logging, file upload) that could be reused after configuration checks.
Any missing multi‑environment support was identified for prior refactoring.
3. Business Service Deployment
Approximately 180 core services were prioritized, focusing first on supply‑chain, inventory, product, and finance domains. Dependency analysis using Snet guided deployment order, ensuring critical services were available before downstream ones.
4. Data Construction / Migration
After service deployment, configuration data was largely ready, while business data was either migrated from production/testing databases or generated via data‑generation platforms and automated scripts.
Results
Within a month, the regression environment reached the quantitative target: all identified core services were deployed, with >90% of them running in cloud‑inside mode.
Service availability was validated by QA, and many isolated services are now ready for pre‑release regression testing.
Future Outlook
Upcoming goals include fully utilizing the regression environment for B‑side testing, expanding coverage to all services that need isolated testing, and enhancing capabilities such as lane isolation, advanced traffic‑shading, and one‑click environment provisioning and teardown.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
