Operations 16 min read

Design and Implementation of a Regression Environment for Yanxuan Offline Systems

To overcome inefficient, unstable offline testing, Yanxuan built an isolated regression environment by provisioning dual cloud‑inside/outside clusters, refactoring infrastructure, deploying roughly 180 core B‑side services, and constructing migration data, achieving over 90 % cloud‑native coverage and enabling stable pre‑release verification with future expansion plans.

NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Design and Implementation of a Regression Environment for Yanxuan Offline Systems

Yanxuan's offline environment governance is a long‑term and complex effort. Building a regression environment has become an urgent, actionable task. Over several months of trial and error, the team has achieved measurable progress, from defining quantitative goals and outlining implementation plans to detailed steps such as cluster planning, infrastructure sorting, service mapping, and regression data construction.

Background

Why manage the offline environment? The current offline setup suffers from three major pain points: (1) mixed integration, testing, and regression environments lead to low efficiency; (2) unstable environments cause long‑dependency services to become unavailable, blocking testing; (3) excessive historical operations make it hard to quickly replicate a usable environment. To address these, Yanxuan started planning a dedicated regression environment in mid‑2020, aiming to provide a stable platform for pre‑release verification that is isolated from frequent updates in other environments.

Why move the offline environment to the cloud? Two deployment models exist: VM‑based (cloud‑outside) and container‑based (cloud‑inside). Cloud‑inside offers lighter weight, easier scaling, and migration, enabling rapid environment replication and teardown. Although many services have already been migrated to the cloud, only a fraction of the total >1,000 services are cloud‑native. The goal is to prioritize cloud‑inside services for the regression environment, encouraging further migration.

How to Build the Regression Environment

The team identified the most frequently blocked services in regression testing and mapped their dependencies, arriving at a quantifiable target and a concrete solution.

Quantifiable Goal

The most blockage‑prone services belong to the B‑side supply‑chain domain, with 75 core services and many related services (procurement, planning, inventory, quality control, distribution, packaging, etc.). In total, about 180 core‑link services across the site were identified. The goal is to build an isolated regression environment for these B‑side services and their dependencies.

Solution Options

Two options were considered: (1) create an independent, resource‑isolated environment; (2) build a dynamic, scalable cluster using traffic‑coloring to include only frequently changed services. The first option guarantees data isolation and stability but incurs higher resource costs. The second relies on mature traffic‑coloring technology and accurate service identification; it saves resources but is more complex and currently not fully feasible.

After analysis, the team chose option 1: an independent, data‑isolated environment. The implementation consists of four major steps:

1. Regression Cluster Provisioning

The regression cluster is led by the Yanxuan operations team with resources provided by the Hangzhou Lightboat team. Two LDCs (Logical Data Centers) are created—one cloud‑inside and one cloud‑outside—to accommodate services that are difficult to migrate.

The cluster also requires special network whitelisting to allow shared base services or data imports from online or test environments.

2. Infrastructure Refactoring

Infrastructure services are divided into two categories: (a) business‑data‑critical services (e.g., BPM, workflow, permission center, message center) that must be independently deployed; (b) shared infrastructure services (e.g., Opera release system, gateway, Apollo config, Snet governance, messaging, Dschedule, APM, logging, file upload). Shared services need to be evaluated for multi‑environment support; some require configuration changes, others only network verification. Services lacking multi‑environment design (e.g., the development workbench) must be adapted.

3. Business Service Deployment

Approximately 180 core services spanning supply‑chain, product, inventory, data platform, finance, technical platform, and operations were prioritized. Dependency analysis using Snet identified three primary backbone services (supply‑chain, inventory, product center) as prerequisites, with other services scheduled based on priority and resource conflicts.

4. Regression Data Construction / Migration

After service deployment, data construction is required. Configuration data is largely prepared during deployment. Business data can be generated either by DBA‑driven migration from production/testing environments or by synthetic data creation (manual API calls, data‑generation platforms, or automation).

With both configuration and business data in place, the regression environment becomes fully usable.

Phase Results

After about a month of intensive deployment, the number of services in the regression environment meets the quantitative target. The supply‑chain domain achieved 100 % deployment of planned services.

Overall, more than 90 % of the deployed services run in the cloud‑inside environment.

Core services (procurement, product, quality control, supplier, inventory, finance, data services) have passed initial QA verification, and several independent services are ready for pre‑release regression testing.

Future Outlook

The next immediate goal is to fully utilize the regression environment for B‑side services, eliminating environment mixing and testing bottlenecks. Subsequent phases will continue to identify regression‑required services among the >1,000 total services and deploy them.

Additional work includes defining responsibilities for integration, testing, and regression environments, ensuring environment stability, and automating data generation. Long‑term plans involve lane isolation, shared base‑service reuse, traffic‑coloring, one‑click environment cloning and destruction, achieving flexible, resource‑efficient multi‑environment usage.

cloud migrationregression testinginfrastructureenvironment governance
NetEase Yanxuan Technology Product Team
Written by

NetEase Yanxuan Technology Product Team

The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.