Cloud Native 19 min read

Achieving Full Cloud‑Native Migration: Hangzhou MingShitang’s Journey to 100% SLA

This case study details how Hangzhou MingShitang migrated its entire online‑education platform from self‑hosted IDC infrastructure to Alibaba Cloud, redesigning registration, configuration, micro‑service governance, safe release and gateway layers with MSE, Sentinel and cloud‑native technologies to attain 100% SLA, dramatically cut costs and boost performance.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Achieving Full Cloud‑Native Migration: Hangzhou MingShitang’s Journey to 100% SLA

Background

Before 2022 the system was deployed on a self‑managed IDC using a private Kubernetes cluster. Core components included:

Eureka for service registration

Apollo and Spring Cloud Config for configuration management

Redis, MySQL, MongoDB, Kafka, RabbitMQ, Hadoop for data storage and processing

Spring Cloud Java services with Zuul 1.0 as the gateway and Nginx as the entry point

ELK, Pinpoint and Zabbix for monitoring

Problems with the IDC Architecture

Stability: traffic spikes during peak periods caused frequent outages and SLA violations.

Elasticity: scaling required hours‑long procurement cycles.

Cost: idle IDC machines wasted resources.

Operational complexity: many self‑built services demanded specialized staff and made troubleshooting difficult.

Full Cloud Migration (2022)

A dedicated migration team partnered with Alibaba Cloud to move all workloads to the cloud, establishing the foundation for subsequent cloud‑native transformation.

Infrastructure Refactor – Registration & Configuration Center

The original stack (Eureka, Apollo, Spring Cloud Config) suffered from cluster unavailability and delayed configuration pushes. After evaluation, the team selected Alibaba Cloud MSE Nacos as a unified service registry and configuration center.

Migration steps:

Developed migration tools to export Apollo and Spring Cloud Config data.

Used MSE Sync to replicate Eureka instances to Nacos with zero‑downtime.

Deployed Nacos namespaces for environments (dev, test, pre‑prod, prod), groups for business lines, and dataId for configuration types.

Result: registration and configuration SLA reached 100% with no incidents.

Service Governance – High‑Availability Toolbox

Hystrix could not meet the >10 k QPS summer peak. The team adopted Alibaba Sentinel (AHAS edition) integrated into MSE traffic governance.

QPS‑based rate limiting

Concurrency isolation (replaces heavyweight thread‑pool isolation, saving >10× memory)

Exception‑based circuit breaking and degradation

Real‑time rule updates without service restart

Implementation SOP:

Coarse‑grained rate limiting at the gateway layer, logging hits to SLS and triggering alerts.

Fine‑grained limit‑circuit‑degrade controls at the application layer, leveraging MSE metrics for dynamic adjustments.

Safe Release – No‑Loss Down/Up

5xx errors were observed during releases due to non‑graceful shutdowns and slow start‑up health checks.

Phase 1 – Custom Solutions

Graceful down: Nacos retains instance metadata for 1 minute. Combined with a Kubernetes preStop hook that sleeps 60 seconds, pods are kept alive long enough for in‑flight requests to finish.

Graceful up: Simplified /health endpoints, removed heavy logic, and introduced delayed registration for services with long initialization.

Phase 2 – Cloud Product Capability

Enabled MSE’s built‑in no‑loss release feature, eliminating the need for custom handling.

Result: >100 Java applications achieved 100 % no‑loss down and up during releases.

Full‑Link Gray Release

Initial internal solution used the open‑source Nepxion Discovery framework, which later proved inflexible. The team switched to the MSE gray‑release product, which provides an Agent‑based, zero‑code integration for mainstream frameworks.

MST publishing system – dynamic application model.

MST traffic‑governance platform – rule management.

MST unified gateway – Go‑based WASM plugin for traffic shading.

MST static rendering service – front‑end gray capability.

MSE Agent – Java service‑to‑service tag propagation.

Release workflow: internal validation → staged rollout (1 % → 5 % → 10 % → full) with continuous monitoring.

Cloud‑Native Gateway Consolidation

Three generations of the unified entry layer:

2018‑2019: Nginx + Spring Cloud Zuul 1.0 – high configuration complexity, no hot‑load.

2022: Nginx + APISIX + Zuul 1.0 – added flexibility but introduced etcd management overhead.

2023: MSE cloud‑native gateway (commercial Higress) – merges traffic and business gateways into a single layer.

Key outcomes after migration to the MSE gateway:

SLA improved to 100 %.

Financial cost reduced by 67 % and compute cost by 75 %.

Average request latency decreased by ~5 ms.

High availability achieved through HTTPS hardware acceleration, kernel tuning, and Envoy parameter optimization.

Scalability enhanced: WASM gray‑plugin migrated to the cloud‑native gateway with second‑level upgrade/rollback.

Results Summary

Registration & configuration center SLA: 100 %.

Service governance: dozens of Sentinel rules deployed, eliminating incidents caused by traffic spikes or downstream slow calls.

Release safety: 100 + Java services with 100 % no‑loss down/up.

Gray release: full‑link traffic shading with staged rollout and instant rule updates.

Gateway: 100 % SLA, 67 % cost reduction, 75 % compute reduction, ~5 ms latency improvement.

Future Direction

With stability secured, the focus shifts to improving development‑test quality, accelerating iteration, and exploring AI‑driven innovations. The organization plans to deepen integration of cloud computing and AI to drive further educational innovation.

Migrationcloud-nativeAlibaba Cloudhigh-availabilityservice-mesh
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.