Operations 14 min read

How We Achieved End-to-End Cloud Stability with Micro Frontends and Automated Deployments

This article details a comprehensive, front‑and‑back‑end approach to cloud stability, covering system architecture across private and public clouds, micro‑frontend integration, CI/CD pipelines, SLB routing, health‑check configurations, monitoring dashboards, data reconciliation, UI automation testing, and the resulting improvements in observability, gray‑release, rollback, and incident reduction.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How We Achieved End-to-End Cloud Stability with Micro Frontends and Automated Deployments

1. System Architecture

The platform spans private and public cloud nodes, with both front‑end and back‑end services interacting across these environments; the public cloud component is a third‑party black‑box system.

2. Frontend Strategy

To ensure a consistent DingTalk experience, third‑party sub‑pages are unified via a micro‑frontend approach, providing monitoring, gray‑release, and rollback capabilities.

2.1 Micro‑Frontend Architecture

Third‑party resources are packaged under a DingTalk domain, allowing users to access them through the cloud system as if they were a single application.

2.2 Micro‑Frontend Benefits

Domain Unification: Enables gray‑release and rollback via the DBase platform.

Isolation: Deploys third‑party H5 resources in an independent public‑cloud environment, preventing CSS/JS conflicts.

Exception Monitoring: Integrates Arms for error monitoring of front‑end pages.

Version Control: Keeps third‑party updates in sync with the main app for safe rollbacks.

Jsapi Calls: Allows seamless DingTalk Jsapi usage after domain unification.

3. Backend Strategy

Stability is treated as a capacity‑bottleneck problem; the weakest module determines overall throughput. Four focus areas—pre‑release control, release availability, post‑release guarantee, and mechanisms & personnel—are prioritized.

3.1 Deployment Pipeline

Using public‑cloud CI/CD on the CloudEffect platform, the pipeline includes:

Create an OSS bucket for artifacts.

Upload built artifacts (JAR, WAR, etc.) with version identifiers.

Define approval workflow (test, product, supervisor).

Replace build steps with OSS artifact download.

Configure ECS group deployment scripts.

Send DingTalk notifications via webhook after deployment.

3.2 Release Availability

During deployment, Nginx’s lack of health checks caused outages. The solution replaces round‑robin with health‑checked SLB routing.

server {</code><code>    location / {</code><code>        proxy_set_header Host $host;</code><code>        proxy_set_header X-Real-IP $remote_addr;</code><code>        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;</code><code>        proxy_set_header X-Forwarded-Proto $scheme;</code><code>        proxy_pass http://proxy-pro;</code><code>    }</code><code>}</code><code># upstream uses round‑robin by default

3.2.1 SLB Forwarding

Configure domain‑based routing on the SLB, mapping domains to backend servers and ports, and enable HTTP HEAD health checks.

After: HTTP requests are forced to HTTPS for higher security.

3.2.2 Health Check Configuration

Health checks send HTTP HEAD requests to backend IP + port + path; responses are compared against expected status codes. Failures trigger alerts.

3.3 Post‑Release Assurance

Monitoring includes dashboards for overall system, ECS, and databases, plus DingTalk group alerts. Data reconciliation with third parties is performed via nightly OSS uploads, ODPS tables, and MAC verification tasks.

3.3.1 UI Automation Testing

Automated UI tests run daily to verify third‑party page availability. Sample test code:

def test_Platform_model_trip_business_travel_ticket_booking(self):</code><code>    # Wait for page load</code><code>    mobile.loop_exist_pic("xx_xxx", subfolder="smart_pic/platform_mode/isv")</code><code>    # Click first ticket</code><code>    x = mobile.get_screenshot_resolution()[0] / 2.0 / mobile.get_scale()</code><code>    y = mobile.get_screenshot_resolution()[1] / 5.0 * 2 / mobile.get_scale()</code><code>    mobile.get_driver().click(x, y)</code><code>    # Assert booking button exists</code><code>    assert mobile.loop_exist_text('预订')[0], '服务商没有可预订的订单'

Failed assertions trigger DingTalk alerts with screenshots for rapid diagnosis.

4. Governance Results

Key improvements after the governance effort:

Full monitoring coverage (✅ vs ❌ before).

Gray‑release capability enabled (✅).

Rollback support added (✅).

Release control mechanisms established (✅).

Monthly incidents reduced from 5 to 0.

5. Future Outlook

The platform will continue strengthening its stability foundation while supporting rapid business growth, emphasizing that stability is a lasting, detail‑oriented battle essential for technical teams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringci/cdautomationMicro FrontendsSLBcloud stability
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.