Cloud Native 13 min read

How Yanxuan Scaled to 1,000 Services with a Cloud‑Native Platform

Facing rapid growth in 2019, Yanxuan partnered with NetEase Qingzhou to co‑build a cloud‑native platform, detailing a multi‑stage migration that standardized services, reduced code changes, enhanced high‑availability, optimized performance, and improved observability, ultimately supporting over 300 cloud‑migrated services and boosting development efficiency by more than 200%.

Yanxuan Tech Team
Yanxuan Tech Team
Yanxuan Tech Team
How Yanxuan Scaled to 1,000 Services with a Cloud‑Native Platform

Background

In early 2019 Yanxuan had nearly a thousand services and a rapidly expanding system. To support this growth, Yanxuan and NetEase Qingzhou launched a joint cloud‑native platform project, marking the start of Yanxuan's cloud‑native evolution.

Cloud Native has become a buzzword in recent years. Definitions from Pivotal and CNCF aim to help applications leverage cloud infrastructure for greater agility and efficiency in increasingly complex business scenarios.

Key Simplification Goals

Service mesh evolution to push middleware down to the infrastructure, simplifying non‑business logic within applications.

Containerization to realize immutable infrastructure, simplifying the setup and maintenance of execution environments.

Cloud‑based DevOps to simplify the flow of applications across different lifecycle stages.

Implementation Considerations

Standardization : Repay technical debt and lay the groundwork for immutable infrastructure.

Reduce source code changes : Lower migration cost, ensure normal business iteration, and avoid bugs introduced by migration.

High‑availability migration : Ensure reliable online services.

Performance tuning : Address performance challenges from service mesh and remote data centers.

Practice

Stage 1 – Laying the Foundation

Using the Yanxuan DevOps project as the backbone, an approximate application execution environment was built on heterogeneous underlying infrastructure, reducing direct perception of the infrastructure by business applications.

CMDB : Manages relationships among personnel, services, and resources, allowing applications to abstract both cloud and on‑premise resources.

Opera : Product‑centric approach that implements immutable infrastructure across testing, regression, and production, unifying deployment pipelines and reducing migration learning costs.

Log Platform : Provides real‑time log collection, routing, storage, quality checks, and analysis, improving data quality and large‑scale log processing, especially in container environments.

Service Mesh : Replaces the original Consul+NGINX mesh with Istio to better integrate with Kubernetes and Docker.

Stage 2 – Building the Framework

Engineering transformation guidelines covering base image standards, CI standards, and platform‑level governance.

Deployment verification guidelines for resource/permission requests, gray‑release validation, and recycling processes, enabling automated pipelines.

Traffic control (API gateway) using Envoy‑based cloud‑native gateway, preserving existing business capabilities.

Stage 3 – Building the Pipeline

Backing Service Cloud‑ification : Migrate databases, caches, and MQ to the cloud, reducing latency and improving scaling.

Service Mesh Optimization : Implement hot‑upgrade mechanisms, gray‑release, configuration slimming, and SR‑IOV container networking for high‑performance nodes.

SNest Service Portal : Unified service governance covering definition, lifecycle, versioning, migration, registration, monitoring, and ownership.

C‑end Activity Cloud‑migration : Pilot for high‑traffic events such as Double‑11.

Stage 4 – Building the Environment

Development Environment : Pure cloud environment for testing infrastructure and enabling cloud‑native refactoring.

Return Environment : Simulates heterogeneous on‑premise and cloud infrastructure to provide near‑production scenarios for large‑scale migration.

Some Gains

IP address handling required introducing IPRange mechanisms during containerization and shifting to domain‑based or service‑mesh routing.

Token‑based service authentication replaces IP‑based checks, reducing transformation cost and improving security.

Observability improvements: increased infrastructure complexity necessitated new monitoring and diagnostic practices, integration of logs, alerts, and Kubernetes events into a unified monitoring system.

Team impact: faster scaling, deeper infra layering, non‑intrusive middleware, refined CI/CD standards, and enhanced resource governance.

Future Plans

Yanxuan now runs over 300 services in the cloud, handling billions of daily calls, with CI/CD adoption above 99% and thousands of pipelines executed via high‑availability GitLab runners, boosting development efficiency by more than 200%.

Upcoming focus includes completing the return environment, extending cloud‑native migration to performance‑sensitive applications, and leveraging service‑mesh capabilities to create version‑based test environments for faster cluster provisioning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud nativeobservabilitydevopscontainerizationservice mesh
Yanxuan Tech Team
Written by

Yanxuan Tech Team

NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.