How Yanxuan Scaled to 1,000 Services with a Cloud‑Native Platform
Facing rapid growth in 2019, Yanxuan partnered with NetEase Qingzhou to co‑build a cloud‑native platform, detailing a multi‑stage migration that standardized services, reduced code changes, enhanced high‑availability, optimized performance, and improved observability, ultimately supporting over 300 cloud‑migrated services and boosting development efficiency by more than 200%.
Background
In early 2019 Yanxuan had nearly a thousand services and a rapidly expanding system. To support this growth, Yanxuan and NetEase Qingzhou launched a joint cloud‑native platform project, marking the start of Yanxuan's cloud‑native evolution.
Cloud Native has become a buzzword in recent years. Definitions from Pivotal and CNCF aim to help applications leverage cloud infrastructure for greater agility and efficiency in increasingly complex business scenarios.
Key Simplification Goals
Service mesh evolution to push middleware down to the infrastructure, simplifying non‑business logic within applications.
Containerization to realize immutable infrastructure, simplifying the setup and maintenance of execution environments.
Cloud‑based DevOps to simplify the flow of applications across different lifecycle stages.
Implementation Considerations
Standardization : Repay technical debt and lay the groundwork for immutable infrastructure.
Reduce source code changes : Lower migration cost, ensure normal business iteration, and avoid bugs introduced by migration.
High‑availability migration : Ensure reliable online services.
Performance tuning : Address performance challenges from service mesh and remote data centers.
Practice
Stage 1 – Laying the Foundation
Using the Yanxuan DevOps project as the backbone, an approximate application execution environment was built on heterogeneous underlying infrastructure, reducing direct perception of the infrastructure by business applications.
CMDB : Manages relationships among personnel, services, and resources, allowing applications to abstract both cloud and on‑premise resources.
Opera : Product‑centric approach that implements immutable infrastructure across testing, regression, and production, unifying deployment pipelines and reducing migration learning costs.
Log Platform : Provides real‑time log collection, routing, storage, quality checks, and analysis, improving data quality and large‑scale log processing, especially in container environments.
Service Mesh : Replaces the original Consul+NGINX mesh with Istio to better integrate with Kubernetes and Docker.
Stage 2 – Building the Framework
Engineering transformation guidelines covering base image standards, CI standards, and platform‑level governance.
Deployment verification guidelines for resource/permission requests, gray‑release validation, and recycling processes, enabling automated pipelines.
Traffic control (API gateway) using Envoy‑based cloud‑native gateway, preserving existing business capabilities.
Stage 3 – Building the Pipeline
Backing Service Cloud‑ification : Migrate databases, caches, and MQ to the cloud, reducing latency and improving scaling.
Service Mesh Optimization : Implement hot‑upgrade mechanisms, gray‑release, configuration slimming, and SR‑IOV container networking for high‑performance nodes.
SNest Service Portal : Unified service governance covering definition, lifecycle, versioning, migration, registration, monitoring, and ownership.
C‑end Activity Cloud‑migration : Pilot for high‑traffic events such as Double‑11.
Stage 4 – Building the Environment
Development Environment : Pure cloud environment for testing infrastructure and enabling cloud‑native refactoring.
Return Environment : Simulates heterogeneous on‑premise and cloud infrastructure to provide near‑production scenarios for large‑scale migration.
Some Gains
IP address handling required introducing IPRange mechanisms during containerization and shifting to domain‑based or service‑mesh routing.
Token‑based service authentication replaces IP‑based checks, reducing transformation cost and improving security.
Observability improvements: increased infrastructure complexity necessitated new monitoring and diagnostic practices, integration of logs, alerts, and Kubernetes events into a unified monitoring system.
Team impact: faster scaling, deeper infra layering, non‑intrusive middleware, refined CI/CD standards, and enhanced resource governance.
Future Plans
Yanxuan now runs over 300 services in the cloud, handling billions of daily calls, with CI/CD adoption above 99% and thousands of pipelines executed via high‑availability GitLab runners, boosting development efficiency by more than 200%.
Upcoming focus includes completing the return environment, extending cloud‑native migration to performance‑sensitive applications, and leveraging service‑mesh capabilities to create version‑based test environments for faster cluster provisioning.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
