Configuration‑as‑Code Platform for Multi‑Region Deployment: Design, Implementation and Practices
To overcome Didi’s costly, months‑long, manual multi‑region rollouts, the team built a Configuration‑as‑Code platform that isolates environment settings in a dedicated repository, enforces a versioned template with validation rules, and integrates automated placeholder substitution into CI/CD, cutting manual effort by roughly 80 % and removing coordination bottlenecks.
Background
With the rapid international expansion of Didi’s ride‑hailing services, the company now operates in many countries and needs to deploy services across multiple data‑centers to reduce latency and improve user experience. From 2020 onward, dozens of data‑center deployments have been performed. Early deployments required extensive manual effort: business developers (RD) had to enumerate resources, request them, adapt code for the new data‑center, perform the deployment, and conduct integration testing. A typical deployment involved hundreds of modules and lasted 2‑3 months, with dozens of people collaborating, leading to high error rates and huge coordination costs.
As cloud‑native technologies mature, the number of micro‑services continues to grow, making it essential to identify the root causes of deployment inefficiency.
Root‑Cause Analysis
When delivering a batch of services, RD work focuses on three areas:
Resource inventory and request
Code changes to adapt to the new environment
Deployment and integration testing
The first item is already addressed by the Application Center. The remaining two items—massive code changes for environment adaptation and low deployment efficiency—are the primary problems this article tackles.
Step 1: Identify Scattered Environment Differences
Environment differences appear in four forms: configuration files that differ per data‑center, hard‑coded values in code, and business‑logic branches that check environment identifiers. Because these differences are scattered across many services, the effort to inventory them must be done once per deployment, and repeated for every new data‑center. Consolidating all differences into a single configuration set per service would limit the scope of inventory, but storing configuration and code in the same Git repository makes it hard to enforce this practice.
Step 2: Adapt Services to the New Environment
Each service has numerous downstream dependencies (Redis, MySQL, Kafka, etc.). Different data‑centers require different connection strings, credentials, time‑outs, etc. Historically, each RD handled these adaptations individually, leading to duplicated communication and coordination overhead.
Step 3: Deploy After Adaptation
After all services finish their adaptations, they are deployed in parallel. However, the overall timeline is dictated by the slowest service; faster services wait weeks for slower ones, and any code defect forces a re‑synchronisation of the whole batch.
Summary of Problems
Environment differences are scattered and tightly coupled with business code, requiring repeated manual inventory.
There is no standard for representing environment differences, causing information silos and duplicated effort.
Large numbers of developers perform manual deployments, leading to coordination bottlenecks and frequent rework.
Solution Idea
To address Problem 1, configuration must be completely separated from code, stored in a dedicated repository, and the code must never contain environment‑specific identifiers. This aligns with the Twelve‑Factor App principle of separating config from code.
To address Problem 2, a standardized configuration template is defined, covering the most common environment‑specific items (resource middleware, credentials, service endpoints, etc.). The template is versioned and validated against a set of rules.
To address Problem 3, an automated delivery pipeline is built that pulls the separated configuration, substitutes placeholders in the code, and performs a single CI/CD build that includes both code and configuration. If any placeholder cannot be resolved, the build fails, preventing bad configurations from reaching production.
Implementation Details
1. Configuration Separation : All environment‑specific values are moved to a dedicated Git repository (one repo per service, keyed by USN – Unique Service Name). The code references these values via placeholders of the form ^cac{{key1.key2}} , where key1.key2 is the full path to a leaf node in the YAML configuration.
2. Standard Template (excerpt):
#1. Resource Middleware
mysql:
defaultTag:
username: ''
password: ''
database: '' // no cluster diff, delivery source
ip: ''
port: 0
#... other fields ...
redis:
defaultTag: #tagName
ip: ''
port: 0 # must be numeric
#2. Environment Variables
env:
physical: '' # sim/pre/small/online
group: '' # simOnline/preview/product
#3. Generic Downstream Services
microService:
ibt-xxx-usn:
ip: ''
port: 0
domain: '' # domain name
#4. Custom Extensions
extend:
key: 'value'3. Validation Rules : Six rule types ensure correctness – key existence, type checking, interface validation (USN lookup), required fields, dictionary validation, and regex/pattern checks.
4. Build & Control Scripts :
cac_build.sh – fetched during the CI build, pulls the appropriate configuration branch, performs placeholder substitution, and downloads cac_control.sh .
cac_control.sh – executed at service start‑up to select the correct configuration file for the target data‑center.
Example snippet used in the build pipeline:
curl -s --connect-timeout 1 -m 3 --output "cac_build.sh" http://aaa.bbb.com/cacConfig/api/service/getCacBuildSh && sh cac_build.sh5. Stability Guarantees : If the CaC service is unavailable, developers can fall back to an offline copy of the scripts. Monitoring alerts trigger if configuration validation fails during deployment.
6. Anti‑Corrosion Measures : Dynamic traffic detection and static code scanning identify hard‑coded environment identifiers that have not been migrated to CaC, preventing regression.
Results
During the 2023‑2024 international rollout, the CaC platform reduced manual effort by ~80 % and eliminated most coordination bottlenecks. The standardized configuration template and automated delivery pipeline proved essential for scaling multi‑region deployments.
Future Work
To support multi‑cloud scenarios, the template will be extended to cover additional resource types, and stricter governance will be applied to custom extensions to avoid template erosion.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.