Tagged articles
4 articles
Page 1 of 1
Tencent Cloud Developer
Tencent Cloud Developer
Jun 8, 2023 · Operations

Stability Governance in Tencent Search: Architecture, Incident Management, and Automation

The article outlines Tencent Search’s stability governance, detailing a multi‑layered availability architecture, disaster‑recovery mechanisms, precise monitoring, rapid emergency workflows, pre‑release interception, extensive automation, and a collaborative governance model that together enhance system resilience, incident detection, and swift remediation.

availability architectureincident responsemonitoring
0 likes · 28 min read
Stability Governance in Tencent Search: Architecture, Incident Management, and Automation
Architect
Architect
May 16, 2023 · Operations

Stability Engineering Practices for the DuoliXiong Local Service Platform

This article outlines the stability engineering approach for Baidu's DuoliXiong local service platform, detailing business challenges, architectural design, development standards, code review, deployment processes, monitoring, and consistency solutions, and presents practical implementations such as automated scaling, fault tolerance, and final consistency mechanisms.

Microservicesmonitoringstability engineering
0 likes · 13 min read
Stability Engineering Practices for the DuoliXiong Local Service Platform
Efficient Ops
Efficient Ops
Jun 1, 2021 · Operations

Mastering System Stability: Building a Chaos‑Driven Platform for Financial Ops

This article details how a major securities firm analyzed business stability, built a comprehensive stability engineering platform using chaos engineering, practiced extensive fault‑injection drills, and outlines future directions such as random‑scenario exercises, red‑blue battles, and AI‑driven risk detection.

Operationschaos engineeringfinancial systems
0 likes · 11 min read
Mastering System Stability: Building a Chaos‑Driven Platform for Financial Ops
Baidu Geek Talk
Baidu Geek Talk
May 26, 2021 · Operations

How Baidu Engineers Scalable Service Governance: Capacity, Traffic, and Stability

This interview details Baidu's practical approach to microservice governance, covering its definition, the evolution from ad‑hoc scaling to automated capacity, traffic, and stability engineering, and the challenges of data collection, standardized interfaces, and decision‑making policies for large‑scale systems.

MicroservicesService Meshcapacity management
0 likes · 12 min read
How Baidu Engineers Scalable Service Governance: Capacity, Traffic, and Stability