SRE Secrets: How Alibaba, Tencent & Dewu Build Ultra-Stable Cloud‑Native Services
On November 25, Dewu Technology hosted an SRE Stability Engineering salon in Hangzhou where experts from Alibaba, Tencent, Ant Group and Dewu shared practical insights on C‑end link reliability, Alibaba’s system stability operations, Tencent Game’s cloud‑native SRE practices, and Ant Group’s chaos engineering, concluding with a Q&A and resource distribution.
Event Overview
On November 25, Dewu Technology organized the "SRE Stability Engineering Exploration & Practice" salon at its Hangzhou R&D center. The event attracted technical experts from Alibaba, Tencent, Ant Group, and Dewu, and was streamed online, receiving over 10,000 likes.
Opening Remarks
Dewu’s Head of Reliability and Security, He Jun, welcomed attendees, highlighted the SRE team’s achievements over the past year, and outlined future goals to enhance system reliability and resilience.
Speaker 1 – Xu Wenhao (Dewu)
Topic: Practical Practices for Ensuring Core C‑End Link Stability at Dewu . Xu described how rapid business expansion increased link complexity and traffic volatility, presenting Dewu’s strategies for monitoring, capacity planning, automated canary releases, and incident mitigation to maintain high availability for consumer‑facing services.
Speaker 2 – Zhao Jiaqi (Alibaba GOC)
Topic: Alibaba Group’s System Stability Operations Management . Zhao covered four key modules: monitoring and alerting, emergency response, fault management, and change control. He shared Alibaba’s GOC framework for handling large‑scale business scenarios, including multi‑level alert thresholds, post‑mortem analysis processes, and automated rollback mechanisms.
Speaker 3 – Wang Jie (Tencent Games)
Topic: Cloud‑Native Service Practices for Complex Heterogeneous Games . Wang explained how Tencent Games leverages the BlueKing Container Management Platform and custom plugins to adopt cloud‑native technologies. He detailed the use of declarative application definitions, GitOps workflows, and service mesh integration to reduce operational costs and improve delivery speed.
Speaker 4 – Zhang Hao (Ant Group)
Topic: Chaos Engineering Theory and Practice at Ant Group . Zhang focused on financial‑system safety, describing how ChaosMeta (Ant’s open‑source chaos platform) is used to inject failures, test high‑availability mechanisms, and validate change procedures. He emphasized the importance of systematic chaos experiments for risk mitigation.
Speaker 5 – Zhan Yuerong (Dewu)
Topic: Asset‑Loss Prevention Technical System Overview . Zhan presented a problem‑driven approach to preventing and containing asset‑loss incidents, outlining the detection pipeline, automated quarantine actions, and post‑incident analysis that together form Dewu’s loss‑prevention framework.
Tea Break
A short intermission provided refreshments, allowing participants to network and discuss the presented material.
Resource Collection
Attendees could obtain the full set of presentation PPTs by leaving a comment with the keyword "PPT" on the Dewu Technology public account.
Closing Remarks
Dewu Technology reiterated its commitment to building a learning‑oriented engineering culture, encouraging knowledge sharing through regular salons, and inviting engineers to contribute to future technical discussions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
