Operations 10 min read

Comprehensive Dependency Governance for High‑Availability Backend Systems

This article outlines a systematic approach to dependency governance in high‑traffic backend services, covering service classification, rate limiting, Dubbo, HTTP, database, and message‑queue management to enhance availability, reduce failure impact, and improve overall system stability.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Comprehensive Dependency Governance for High‑Availability Backend Systems

Background

The authors previously shared a cache governance practice and now extend the stability governance to cover system‑level dependencies such as external components, interfaces, and the services they expose (Dubbo, HTTP, DB, MQ, etc.).

Governance Plan

Service Classification and Dependency Governance

1) Applications are graded (P1, P2, P3) based on business core importance and impact, and dependencies are mapped accordingly.

2) P1 services must be deployed across multiple data centers, ensuring that no single data center holds more than half of the online instances, thereby reducing the impact of a single‑site failure.

3) Strong dependencies are weakened to enable degradation; weak dependencies are made asynchronous to allow circuit‑breaking. Critical‑to‑critical calls receive pre‑planned fallback strategies, while non‑critical calls are isolated to prevent cascading failures.

Rate Limiting

The team adopts a unified Sentinel component for traffic control, providing dynamic rate limiting for Dubbo and HTTP interfaces, business‑level throttling based on request parameters, and optional cluster‑wide limits. Proper rate limiting is applied judiciously to avoid degrading user experience during normal traffic spikes.

Dubbo Governance

Key measures include monitoring Dubbo thread pools, isolating core and non‑core interfaces into separate thread pools, and configuring reasonable timeout values on both provider and consumer sides.

HTTP Governance

Practices involve setting appropriate timeout thresholds, encouraging asynchronous calls, implementing controlled retries, and isolating thread pools and clients to prevent cross‑interference.

Database Governance

High availability is ensured through multi‑replica storage, rapid recovery mechanisms, and removal of unnecessary data. Monitoring of query performance and MyBatis interceptors are employed for early detection of issues.

MQ Governance

The approach handles single‑MQ failures or message backlogs by enabling fast failover to alternative channels, using multiple topics or MQ clusters, and guaranteeing idempotent consumption to avoid data loss.

Additional Practices

Monitoring is enhanced for Dubbo, HTTP, and DB operations; dashboards include app‑code dimensions for quick inspection; and timeout configurations are regularly reviewed for optimal values.

Governance Process

The workflow mirrors previous cache governance: identify scenarios, define solutions, develop and test, deploy, and conduct online drills with iterative improvements. Deployment is staged, first adding rate‑limiting components and monitoring, then optimizing based on observed metrics.

Summary

Post‑incident reviews drive proactive measures that reduce failure frequency, duration, and impact. Dependency governance is an ongoing effort, with future plans to automate dependency tagging and integrate with dedicated service‑governance platforms for dynamic detection and rapid response.

backend architectureoperationsDubboDependency ManagementRate Limitingservice reliability
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.