Operations 14 min read

How We Built a Stable Regression Environment for Over 1,000 Services

This article details the challenges, planning, and step‑by‑step execution of creating an isolated regression environment for a large micro‑service platform, covering quantifiable goals, solution evaluation, infrastructure refactoring, data migration, deployment results, and future improvements.

Yanxuan Tech Team
Yanxuan Tech Team
Yanxuan Tech Team
How We Built a Stable Regression Environment for Over 1,000 Services

Background

Managing offline environments for the Yanxuan platform is a long‑term, complex task. Building a regression environment became an urgent, actionable goal. After months of trial and error, the team defined quantitative targets, drafted implementation plans, and progressed through clustering, infrastructure, business, and data construction steps.

Why Offline Environment Governance?

Key pain points include mixed integration, testing, and regression environments causing low efficiency; unstable environments leading to service outages; and non‑standardized historical operations that prevent rapid environment replication.

To address these, Yanxuan planned a dedicated regression environment in 2020, ensuring core services have a stable platform for pre‑release verification without interference from other environments.

Why Move the Offline Environment to the Cloud?

Two deployment models exist: VM‑based (cloud‑outside) and container‑based (cloud‑inside). Cloud‑inside offers lighter weight, easier scaling, and faster environment cloning and destruction.

Although many services have already migrated to the cloud, only a fraction of the total services are cloud‑native. The goal is to prioritize cloud‑inside deployment for the regression environment, extending existing cloud services and encouraging non‑cloud services to migrate.

How We Conducted Regression Environment Construction

Quantifiable Goal

The most blockage‑prone services are the B‑side supply‑chain core services (≈75 services) with long dependency chains covering procurement, planning, inventory, quality control, distribution, packaging, and supply‑chain management.

We identified roughly 180 core‑link services across the platform and set the goal to build an isolated regression environment for these B‑side services.

Feasible Solution

Two options were evaluated:

Option 1: Build an independent, data‑isolated environment.

Option 2: Use traffic‑shading to create a dynamic, extensible cluster containing only frequently changed services.

Option 1 guarantees data isolation and stability but incurs higher resource costs. Option 2 saves resources but depends on accurate service identification and mature traffic‑shading technology, which was not yet fully reliable.

After analysis, we chose Option 1.

Implementation Steps

1. Regression Cluster Pre‑setup

The operations team prepared logical deployment units (LDCs) for both cloud‑inside and cloud‑outside services, handling network whitelists and data isolation.

2. Infrastructure Refactoring

Infrastructure services were split into two categories: those tightly coupled with business data (e.g., BPM, workflow, permission center) requiring independent deployment, and shared services (e.g., release system, gateway, configuration center, messaging, scheduling, APM, logging, file upload) that could be reused after configuration checks.

Any missing multi‑environment support was identified for prior refactoring.

3. Business Service Deployment

Approximately 180 core services were prioritized, focusing first on supply‑chain, inventory, product, and finance domains. Dependency analysis using Snet guided deployment order, ensuring critical services were available before downstream ones.

4. Data Construction / Migration

After service deployment, configuration data was largely ready, while business data was either migrated from production/testing databases or generated via data‑generation platforms and automated scripts.

Results

Within a month, the regression environment reached the quantitative target: all identified core services were deployed, with >90% of them running in cloud‑inside mode.

Service availability was validated by QA, and many isolated services are now ready for pre‑release regression testing.

Future Outlook

Upcoming goals include fully utilizing the regression environment for B‑side testing, expanding coverage to all services that need isolated testing, and enhancing capabilities such as lane isolation, advanced traffic‑shading, and one‑click environment provisioning and teardown.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud migrationregression testingInfrastructure Automationenvironment governanceservice deployment
Yanxuan Tech Team
Written by

Yanxuan Tech Team

NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.