How Nacos and Spring Cloud Alibaba Powered MasterClass’s Cloud‑Native Microservice Migration
This article details MasterClass Education’s end‑to‑end cloud‑native migration using Spring Cloud Alibaba, Nacos, Sentinel and related tools, covering registry selection, Nacos server deployment, monitoring, logging, performance testing, high‑availability sync with Eureka, CI/CD pipelines, gray releases, APM integration and disaster‑recovery drills.
MasterClass Education transformed its online‑learning platform into a cloud‑native microservice architecture, leveraging Spring Cloud Alibaba, Nacos, Sentinel, Arthas and other components, and deploying on Docker and Alibaba Cloud Kubernetes.
Background
Rapid business growth and pandemic‑driven demand increased traffic and microservice count, exposing the limitations of the legacy Eureka registry and prompting a move to a more robust solution.
Why Choose Spring Cloud Alibaba & Nacos?
After evaluating Alibaba Nacos, HashiCorp Consul and K8s CoreDNS, Nacos was selected for its AP+CP consensus, multi‑language DNS‑F registration, load‑balancing, avalanche protection, and multi‑data‑center support, making it a perfect fit for the existing Spring Cloud Alibaba stack.
Conclusion on Registry Comparison
Nacos satisfies MasterClass’s service‑governance needs, enables smooth migration from Eureka, and benefits from an active community and rich feature set.
Nacos Server Deployment
Four isolated environments (DEV, FAT, UAT, PROD) are deployed; DEV uses a single node, while the others run three‑node clusters accessed via domain names behind SLB load balancers. Both SDK and dashboard connections use the domain‑based endpoints.
Environment & Domain Isolation
Namespaces isolate services per environment; the enabled flag in NacosDiscoveryProperties can disable local connections when needed.
LDAP Integration
The dashboard authenticates users against the corporate LDAP directory and records first‑login information.
Dashboard Permissions
Default users have read‑only access; only administrators can modify services.
Service Overview
The dashboard displays total services and instances, refreshed every five seconds.
Monitoring
Standard monitoring uses the company’s Prometheus, Grafana and Alertmanager stack. Advanced monitoring follows the Nacos monitoring guide, exposing custom metrics to Prometheus and visualizing them in Grafana.
Instance State Monitoring
Heartbeat, registration, deregistration and timeout events are tracked to ensure service health.
Logging
Logs from all Nacos modules are merged by level (info, warn, error), tagged with a schema field, and output in JSON format for ELK ingestion.
Alerting
Alerts fire on service instance up/down events and on naming convention violations (e.g., uppercase service names).
Performance Testing
def registry(ip):
fo = open("service_name.txt", "r")
str = fo.read()
service_name_list = str.split(";")
service_name = service_name_list[random.randint(0, len(service_name_list)-1)]
fo.close()
client = nacos.NacosClient(nacos_host, namespace='')
print(client.add_naming_instance(service_name, ip, 333, "default", 1.0, {'preserved.ip.delete.timeout':86400000}, True, True))
while True:
print(client.send_heartbeat(service_name, ip, 333, "default", 1.0, "{}"))
time.sleep(5)Testing a 3‑node 1C4G Nacos cluster with 1,499 services and 12,715 instances showed CPU and memory staying within acceptable ranges, confirming Nacos’s performance suitability.
Nacos‑Eureka Sync Implementation
To achieve bidirectional synchronization between Nacos and Eureka, several designs were explored:
Official sync solution – simple but not HA and failed under load.
High‑availability consistent‑hash + Zookeeper – distributes sync tasks using a hash ring and watches Zookeeper nodes for failures.
Primary‑backup + Zookeeper – higher cost and less elegant, ultimately discarded.
Consistent‑hash + Etcd – persists service lists in Etcd, uses Etcd watches for re‑hashing, and serves as the bridge for bidirectional sync.
Sync Architecture
Sync services pull service lists from both registries, persist tasks in Etcd, and watch Etcd for node health. Consistent‑hash routing assigns tasks to live nodes; failed nodes trigger re‑hashing and task redistribution. Etcd leases with TTL ensure rapid detection of node loss.
Sync UI and Monitoring
The UI shows real‑time sync status, allows manual addition/removal of sync tasks, and integrates with the DevOps release platform for automated task creation when services are deployed.
Alerting and Upgrade Drills
Extensive disaster‑recovery drills demonstrated that the sync cluster could tolerate up to eight of nine nodes failing, with automatic re‑hashing and recovery times under one minute. Upgrade procedures for FAT, UAT and PROD environments completed without service disruption.
Solar Cloud‑Native Microservice Practice
MasterClass built the “Solar” framework on top of Spring Cloud Alibaba, providing SDKs for Nacos, Sentinel, gray‑release, environment isolation, and DevOps integration.
Nacos SDK
Encapsulates four environment domains, supports cross‑registry blue‑green releases, integrates with SkyWalking for tracing, and offers annotations like @EnableSolarService to simplify usage.
Sentinel SDK
Provides environment‑aware Sentinel endpoints, persists rules to Apollo, supports OpenTracing & SkyWalking integration, and adds cluster‑level flow control and custom limit‑app rules for gray releases.
Gray Release & Environment Isolation
Uses Spring Cloud Alibaba, Nacos SDK and Nepxion Discovery to implement version‑based and region‑based routing, with UI panels for condition‑driven and weight‑based traffic splitting.
DevOps Release Platform
Offers semi‑automatic gray releases, rolling no‑downtime deployments, and integrates with the sync system to keep registries consistent.
APM with SkyWalking
Collects service, instance and endpoint metrics, visualizes anomalies, and correlates logs and traces. Sentinel metrics are also exported to SkyWalking.
Application Diagnosis (Arthas & Bistoury)
Provides a web console for JVM diagnostics, hotspot analysis and online debugging without SSH access.
CI/CD Pipeline
Jenkins builds JARs and Docker images, pushes them to Harbor, and a custom Hyperion service invokes Alibaba Cloud Kubernetes APIs for deployment. Harbor events are fed back to the CI/CD UI for real‑time status.
Log Collection
Pods write logs to a shared path; Filebeat on each node ships logs to Kafka, GoHangout consumers persist them to Elasticsearch, and Kibana visualizes the data. The pipeline scales without performance issues.
Elastic Scaling & Self‑Healing
Metrics from Sentinel, monitoring and custom health checks drive horizontal pod autoscaling and automated recovery actions.
Network Migration
Alibaba Cloud Terway CNI provides flat networking; SLB automatically tracks pod IP changes, enabling seamless migration to Kubernetes.
Conclusion
MasterClass’s cloud‑native journey demonstrates a comprehensive, production‑grade migration from Eureka to Nacos, high‑availability sync, observability, automated release, and resilient operations, offering practical guidance for teams adopting Spring Cloud Alibaba‑based microservices on Kubernetes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
