How We Achieved End-to-End Cloud Stability with Micro Frontends and Automated Deployments
This article details a comprehensive, front‑and‑back‑end approach to cloud stability, covering system architecture across private and public clouds, micro‑frontend integration, CI/CD pipelines, SLB routing, health‑check configurations, monitoring dashboards, data reconciliation, UI automation testing, and the resulting improvements in observability, gray‑release, rollback, and incident reduction.
1. System Architecture
The platform spans private and public cloud nodes, with both front‑end and back‑end services interacting across these environments; the public cloud component is a third‑party black‑box system.
2. Frontend Strategy
To ensure a consistent DingTalk experience, third‑party sub‑pages are unified via a micro‑frontend approach, providing monitoring, gray‑release, and rollback capabilities.
2.1 Micro‑Frontend Architecture
Third‑party resources are packaged under a DingTalk domain, allowing users to access them through the cloud system as if they were a single application.
2.2 Micro‑Frontend Benefits
Domain Unification: Enables gray‑release and rollback via the DBase platform.
Isolation: Deploys third‑party H5 resources in an independent public‑cloud environment, preventing CSS/JS conflicts.
Exception Monitoring: Integrates Arms for error monitoring of front‑end pages.
Version Control: Keeps third‑party updates in sync with the main app for safe rollbacks.
Jsapi Calls: Allows seamless DingTalk Jsapi usage after domain unification.
3. Backend Strategy
Stability is treated as a capacity‑bottleneck problem; the weakest module determines overall throughput. Four focus areas—pre‑release control, release availability, post‑release guarantee, and mechanisms & personnel—are prioritized.
3.1 Deployment Pipeline
Using public‑cloud CI/CD on the CloudEffect platform, the pipeline includes:
Create an OSS bucket for artifacts.
Upload built artifacts (JAR, WAR, etc.) with version identifiers.
Define approval workflow (test, product, supervisor).
Replace build steps with OSS artifact download.
Configure ECS group deployment scripts.
Send DingTalk notifications via webhook after deployment.
3.2 Release Availability
During deployment, Nginx’s lack of health checks caused outages. The solution replaces round‑robin with health‑checked SLB routing.
server {</code><code> location / {</code><code> proxy_set_header Host $host;</code><code> proxy_set_header X-Real-IP $remote_addr;</code><code> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;</code><code> proxy_set_header X-Forwarded-Proto $scheme;</code><code> proxy_pass http://proxy-pro;</code><code> }</code><code>}</code><code># upstream uses round‑robin by default3.2.1 SLB Forwarding
Configure domain‑based routing on the SLB, mapping domains to backend servers and ports, and enable HTTP HEAD health checks.
After: HTTP requests are forced to HTTPS for higher security.
3.2.2 Health Check Configuration
Health checks send HTTP HEAD requests to backend IP + port + path; responses are compared against expected status codes. Failures trigger alerts.
3.3 Post‑Release Assurance
Monitoring includes dashboards for overall system, ECS, and databases, plus DingTalk group alerts. Data reconciliation with third parties is performed via nightly OSS uploads, ODPS tables, and MAC verification tasks.
3.3.1 UI Automation Testing
Automated UI tests run daily to verify third‑party page availability. Sample test code:
def test_Platform_model_trip_business_travel_ticket_booking(self):</code><code> # Wait for page load</code><code> mobile.loop_exist_pic("xx_xxx", subfolder="smart_pic/platform_mode/isv")</code><code> # Click first ticket</code><code> x = mobile.get_screenshot_resolution()[0] / 2.0 / mobile.get_scale()</code><code> y = mobile.get_screenshot_resolution()[1] / 5.0 * 2 / mobile.get_scale()</code><code> mobile.get_driver().click(x, y)</code><code> # Assert booking button exists</code><code> assert mobile.loop_exist_text('预订')[0], '服务商没有可预订的订单'Failed assertions trigger DingTalk alerts with screenshots for rapid diagnosis.
4. Governance Results
Key improvements after the governance effort:
Full monitoring coverage (✅ vs ❌ before).
Gray‑release capability enabled (✅).
Rollback support added (✅).
Release control mechanisms established (✅).
Monthly incidents reduced from 5 to 0.
5. Future Outlook
The platform will continue strengthening its stability foundation while supporting rapid business growth, emphasizing that stability is a lasting, detail‑oriented battle essential for technical teams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
