Cloud Native 26 min read

Configuration‑as‑Code Platform for Multi‑Region Deployment: Design, Implementation and Practices

To overcome Didi’s costly, months‑long, manual multi‑region rollouts, the team built a Configuration‑as‑Code platform that isolates environment settings in a dedicated repository, enforces a versioned template with validation rules, and integrates automated placeholder substitution into CI/CD, cutting manual effort by roughly 80 % and removing coordination bottlenecks.

Didi Tech

Sep 19, 2024

Configuration‑as‑Code Platform for Multi‑Region Deployment: Design, Implementation and Practices

Background

With the rapid international expansion of Didi’s ride‑hailing services, the company now operates in many countries and needs to deploy services across multiple data‑centers to reduce latency and improve user experience. From 2020 onward, dozens of data‑center deployments have been performed. Early deployments required extensive manual effort: business developers (RD) had to enumerate resources, request them, adapt code for the new data‑center, perform the deployment, and conduct integration testing. A typical deployment involved hundreds of modules and lasted 2‑3 months, with dozens of people collaborating, leading to high error rates and huge coordination costs.

As cloud‑native technologies mature, the number of micro‑services continues to grow, making it essential to identify the root causes of deployment inefficiency.

Root‑Cause Analysis

When delivering a batch of services, RD work focuses on three areas:

Resource inventory and request

Code changes to adapt to the new environment

Deployment and integration testing

The first item is already addressed by the Application Center. The remaining two items—massive code changes for environment adaptation and low deployment efficiency—are the primary problems this article tackles.

Step 1: Identify Scattered Environment Differences

Environment differences appear in four forms: configuration files that differ per data‑center, hard‑coded values in code, and business‑logic branches that check environment identifiers. Because these differences are scattered across many services, the effort to inventory them must be done once per deployment, and repeated for every new data‑center. Consolidating all differences into a single configuration set per service would limit the scope of inventory, but storing configuration and code in the same Git repository makes it hard to enforce this practice.

Step 2: Adapt Services to the New Environment

Each service has numerous downstream dependencies (Redis, MySQL, Kafka, etc.). Different data‑centers require different connection strings, credentials, time‑outs, etc. Historically, each RD handled these adaptations individually, leading to duplicated communication and coordination overhead.

Step 3: Deploy After Adaptation

After all services finish their adaptations, they are deployed in parallel. However, the overall timeline is dictated by the slowest service; faster services wait weeks for slower ones, and any code defect forces a re‑synchronisation of the whole batch.

Summary of Problems

Environment differences are scattered and tightly coupled with business code, requiring repeated manual inventory.

There is no standard for representing environment differences, causing information silos and duplicated effort.

Large numbers of developers perform manual deployments, leading to coordination bottlenecks and frequent rework.

Solution Idea

To address Problem 1, configuration must be completely separated from code, stored in a dedicated repository, and the code must never contain environment‑specific identifiers. This aligns with the Twelve‑Factor App principle of separating config from code.

To address Problem 2, a standardized configuration template is defined, covering the most common environment‑specific items (resource middleware, credentials, service endpoints, etc.). The template is versioned and validated against a set of rules.

To address Problem 3, an automated delivery pipeline is built that pulls the separated configuration, substitutes placeholders in the code, and performs a single CI/CD build that includes both code and configuration. If any placeholder cannot be resolved, the build fails, preventing bad configurations from reaching production.

Implementation Details

1. Configuration Separation : All environment‑specific values are moved to a dedicated Git repository (one repo per service, keyed by USN – Unique Service Name). The code references these values via placeholders of the form ^cac{{key1.key2}}, where key1.key2 is the full path to a leaf node in the YAML configuration.

2. Standard Template (excerpt):

#1. Resource Middleware</code><code>mysql:</code><code>  defaultTag:</code><code>    username: ''</code><code>    password: ''</code><code>    database: ''  // no cluster diff, delivery source</code><code>    ip: ''</code><code>    port: 0</code><code>#... other fields ...</code><code>redis:</code><code>  defaultTag: #tagName</code><code>    ip: ''</code><code>    port: 0  # must be numeric</code><code>#2. Environment Variables</code><code>env:</code><code>  physical: '' # sim/pre/small/online</code><code>  group: '' # simOnline/preview/product</code><code>#3. Generic Downstream Services</code><code>microService:</code><code>  ibt-xxx-usn:</code><code>    ip: ''</code><code>    port: 0</code><code>    domain: '' # domain name</code><code>#4. Custom Extensions</code><code>extend:</code><code>  key: 'value'

3. Validation Rules : Six rule types ensure correctness – key existence, type checking, interface validation (USN lookup), required fields, dictionary validation, and regex/pattern checks.

4. Build & Control Scripts : cac_build.sh – fetched during the CI build, pulls the appropriate configuration branch, performs placeholder substitution, and downloads cac_control.sh. cac_control.sh – executed at service start‑up to select the correct configuration file for the target data‑center.

Example snippet used in the build pipeline:

curl -s --connect-timeout 1 -m 3 --output "cac_build.sh" http://aaa.bbb.com/cacConfig/api/service/getCacBuildSh && sh cac_build.sh

5. Stability Guarantees : If the CaC service is unavailable, developers can fall back to an offline copy of the scripts. Monitoring alerts trigger if configuration validation fails during deployment.

6. Anti‑Corrosion Measures : Dynamic traffic detection and static code scanning identify hard‑coded environment identifiers that have not been migrated to CaC, preventing regression.

Results

During the 2023‑2024 international rollout, the CaC platform reduced manual effort by ~80 % and eliminated most coordination bottlenecks. The standardized configuration template and automated delivery pipeline proved essential for scaling multi‑region deployments.

Future Work

To support multi‑cloud scenarios, the template will be extended to cover additional resource types, and stricter governance will be applied to custom extensions to avoid template erosion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

microservices configuration management devops deployment-automation

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.