Cloud Native 14 min read

How a Smart‑Home IoT Platform Mastered Cloud‑Native Migration and Zero‑Loss Deployments

This article details how Chuangmi Smart Home transformed its microservice architecture into a cloud‑native system using Alibaba Cloud ACK, MSE traffic governance, SkyWalking observability, and automated CI/CD pipelines, achieving stable, zero‑loss releases for millions of IoT devices worldwide.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How a Smart‑Home IoT Platform Mastered Cloud‑Native Migration and Zero‑Loss Deployments

New Business and Emerging Challenges

Since 2019, Chuangmi Smart Home shifted its R&D focus to a proprietary app and associated IoT devices, deploying services across four Alibaba Cloud ACK Pro Kubernetes clusters in different regions. The initial stack—Spring Cloud, Eureka, and Apollo—proved unstable, risky to release, and costly to maintain.

Cloud‑Native Exploration

In 2021 the team migrated from Spring Cloud to Spring Cloud Alibaba, replacing Eureka with Nacos and Apollo with Nacos to alleviate registration pressure and disk usage. They opted for Alibaba Cloud MSE Nacos Professional service instead of self‑hosting, achieving high availability without operational overhead.

Architecture diagram of Nacos migration
Architecture diagram of Nacos migration

Full‑Link Traffic Governance

IoT traffic includes HTTP from apps, MQTT, AMQP, and HTTP/2 push messages from devices. A unified upstream message bus tags and filters messages before routing them via regular expressions to appropriate services. The team evaluated invasive code modifications, custom load balancers, and Istio Service Mesh, but rejected them due to performance limits and configuration complexity, choosing Alibaba Cloud MSE microservice governance instead.

Using multi‑domain, multi‑tenant gateways combined with MSE, they built isolated namespaces and clusters for development, testing, gray‑release, and baseline environments. Traffic is gradually shifted from baseline to gray, then fully promoted after validation, ensuring zero‑downtime releases.

Traffic governance workflow
Traffic governance workflow

Lossless Scaling and Release

To avoid request loss during pod scaling or version upgrades, the team leveraged MSE’s lossless up/down and pre‑warming features. By adjusting readiness probes and routing new pods only after they receive warm traffic, they eliminated timeout incidents that previously affected device reliability.

Lossless up/down diagram
Lossless up/down diagram

Observability System

Early integration with Alibaba Cloud SLS log service revealed issues such as inconsistent request indexing and high CPU usage from async logging. The team standardized log formats, injected SkyWalking TraceId into SLS, and modified ThreadLocal handling to propagate trace context across async boundaries. SkyWalking agents and custom plugins feed data into Alibaba Cloud’s tracing service, enabling rapid performance issue detection.

Metrics are visualized via ARMS ACK, ARMS ACK Pro, and Grafana dashboards covering cluster, node, and pod dimensions. Alerts span cloud product metrics, SLS logs, and K8s events, with multi‑channel notifications (phone, SMS, email, Feishu) and noise‑reduction tuning.

CI/CD Efficiency

Previously fragmented across GitLab, Jenkins, and custom scripts, the CI/CD pipeline was consolidated on Alibaba Cloud Codeup and Cloud Effect. Multiple pipelines now support single‑region, multi‑region, and multi‑cloud projects, automating code checkout, build, and deployment to specific K8s clusters and namespaces. Post‑deployment, Feishu notifications and Newman‑driven automated tests evaluate release safety.

Stability Assessment and Chaos Engineering

The team conducts chaos experiments targeting Java OOM, cache stampedes, network latency, pod resource exhaustion, and critical cloud services. Findings drive improvements such as multi‑AZ resource distribution, refined alert thresholds, and a high‑availability‑first design philosophy, all verified without impacting production users.

Future Outlook

Looking ahead, Chuangmi plans to replace the heavyweight Spring Cloud Gateway with a cloud‑native gateway powered by WASM plugins, further boosting performance, flexibility, and extensibility while integrating with the existing observability stack.

MSE microservice governance can be enabled by installing the ack-onepilot component via Helm or the component center; the service restarts automatically inject the governance sidecar, and the provided UI simplifies full‑link gray releases compared with Istio.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Microservicestraffic managementIoT
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.