Stability Engineering Practices for the DuoliXiong Local Service Platform
This article outlines the stability engineering approach for Baidu's DuoliXiong local service platform, detailing business challenges, architectural design, development standards, code review, deployment processes, monitoring, and consistency solutions, and presents practical implementations such as automated scaling, fault tolerance, and final consistency mechanisms.
DuoliXiong is Baidu's local life service SaaS platform that helps merchants acquire customers and provides users with local dining, entertainment, and leisure services. The platform consists of three main products: the Merchant Platform (PC backend, mini‑program, and app for merchants), the Operations Platform (internal management for merchant and product reviews), and the User Platform (C‑end mini‑programs and apps).
The rapid growth of micro‑service modules (user, product, order, merchant, coupon, payment, etc.) introduced challenges such as ensuring architectural robustness, managing long service call chains, handling numerous external dependencies, and maintaining short iteration cycles.
01 Business Introduction
Key challenges include the explosion of micro‑service modules, extensive internal and external service dependencies, and the need for fast, reliable iteration.
02 Construction Philosophy
Stability construction is driven by three dimensions: technical specifications, business specifications, and micro‑service implementation. Practices include requirement analysis, cross‑team alignment, risk assessment, documentation, and continuous review.
03 Implementation Process
3.1 Solution Design – Clarify requirements, ensure feasibility, align cross‑team expectations, assess risks, and produce detailed documentation (version, development docs, project background, technical solution, interface design, storage design, compatibility, monitoring, and release plan).
3.2 Technical Review – Establish a review team, define evaluation criteria (architecture, interfaces, performance, data models, core flows), and conduct periodic reviews to ensure architectural soundness.
3.3 Coding Standards – Enforce code quality, development efficiency, collaboration, and rapid iteration through coding, security, MySQL, logging, and exception guidelines.
3.4 Code Review – Detect logical errors, enforce style consistency, and verify robustness across layers, configurations, and SQL usage.
3.5 Deployment – Follow a structured release process: prepare release documents, announce windows, perform preview deployments, execute staged roll‑outs, and verify post‑deployment health.
3.6 Issue Handling – Prioritize rapid notification, damage control, and root‑cause analysis; separate bug‑fix releases from feature releases to minimize risk.
04 Practical Practices
4.1 Stability Loop – Integrate monitoring, alerting, and continuous improvement into the development lifecycle.
4.2 Final Consistency – Use asynchronous calls with a local message table to achieve eventual consistency, avoiding data divergence across services.
4.3 Idempotency & Retry – Distinguish between simple deduplication (防重) and strict idempotency, handling scenarios such as duplicate submissions, timeout retries, message re‑consumption, and high‑concurrency ID collisions.
4.4 Monitoring & Alerts – Deploy cloud‑native monitoring (Prometheus, Grafana) and log services (Trace, Tianyan) to capture metrics, logs, and alerts, enabling rapid fault isolation.
4.5 Additional Practices – Implement automated scaling based on custom Prometheus metrics, intelligent fault tolerance for core business flows, and degradation strategies for dependent services (Redis, MQ).
The article concludes with a call for community engagement, sharing of technical insights, and contact information for further collaboration.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.