Didi Tech
Author

Didi Tech

Official Didi technology account

296
Articles
0
Likes
672
Views
0
Comments
Recent Articles

Latest from Didi Tech

100 recent articles max
Didi Tech
Didi Tech
Oct 12, 2023 · Cloud Computing

Elastic Cloud Mixed Deployment: Architecture, Scheduling, Isolation, and Future Directions

Didi's Elastic Cloud uses mixed deployment to co‑locate diverse services, employing tiered guarantees, custom Kubernetes scheduling, profiling, rescheduling, and isolation‑cluster techniques to boost utilization while preserving QoS, with a roadmap for broader automation and interference detection.

Dynamic Scalingmixed deploymentperformance isolation
0 likes · 25 min read
Elastic Cloud Mixed Deployment: Architecture, Scheduling, Isolation, and Future Directions
Didi Tech
Didi Tech
Oct 10, 2023 · Backend Development

Investigation of 300‑Second Redis Timeout Issues in a Go Service

The article details how a Go service’s 300‑second Redis call timeout was traced to a gateway’s full‑NAT session‑table loss, and explains how targeted retries, proper timeout settings, and rate‑limiting can prevent similar cascading failures in distributed systems.

GoNetwork TroubleshootingRedis
0 likes · 9 min read
Investigation of 300‑Second Redis Timeout Issues in a Go Service
Didi Tech
Didi Tech
Sep 26, 2023 · Databases

Didi's Time Series Storage Evolution: From InfluxDB to VictoriaMetrics

Facing exponential growth of time‑series data from 2017 to 2023, Didi migrated from InfluxDB to RRDtool, then to an in‑memory cache layer, and finally adopted VictoriaMetrics because its low‑cost commodity‑hardware operation, high write throughput, strong compression, and easy horizontal scaling solved the earlier storage, OOM, and scalability problems.

TSDBTime Series DatabaseVictoriaMetrics
0 likes · 13 min read
Didi's Time Series Storage Evolution: From InfluxDB to VictoriaMetrics
Didi Tech
Didi Tech
Sep 21, 2023 · Cloud Native

OBC: A Cloud-Native Real-Time Computing Engine for Metrics at Didi

To replace costly, duplicated Flink jobs, Didi built Observe‑Compute (OBC), a cloud‑native, PromQL‑driven real‑time metric engine with centralized policy management, scalable containerized workers, and zero‑downtime scaling, achieving million‑RMB annual savings while handling 10 M points per second.

Flink alternativeOBCPromQL
0 likes · 17 min read
OBC: A Cloud-Native Real-Time Computing Engine for Metrics at Didi
Didi Tech
Didi Tech
Sep 19, 2023 · Cloud Native

OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System

OrangeFS is Didi’s cloud‑native, multi‑protocol distributed data‑lake storage system that unifies POSIX, S3 and HDFS access on a single logical hierarchy, integrates with Kubernetes via a CSI plugin, supports on‑premise and public‑cloud backends, provides multi‑tenant isolation, and dramatically improves elasticity, utilization and latency for petabyte‑scale workloads such as ride‑hailing logs, machine‑learning training, finance and analytics.

CSICloud Native StorageFUSE
0 likes · 17 min read
OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System
Didi Tech
Didi Tech
Sep 12, 2023 · Operations

Observability: Concepts, Challenges, and Didi’s Implementation

The article explains observability as the ability to infer any system state from external data, contrasts it with traditional monitoring, outlines challenges of high‑dimensional, high‑cardinality data and storage costs, and describes Didi’s hybrid MTL architecture that separates low‑ and high‑cardinality logs and metrics while linking them via TraceIDs to provide detailed, cost‑effective insight and streamlined debugging.

DidiMicroservicesMonitoring
0 likes · 9 min read
Observability: Concepts, Challenges, and Didi’s Implementation
Didi Tech
Didi Tech
Sep 7, 2023 · Cloud Native

Service Management and Resource Abstraction in Cloud‑Native Environments Using OAM and KubeVela

To tackle the exploding number of microservices and heterogeneous infrastructure in cloud‑native enterprises, the article proposes a unified service‑and‑resource abstraction built on the Open Application Model and its implementation KubeVela, enabling declarative application definitions, cost attribution, automated lifecycle management, and cross‑region efficiency through component marketplaces, an application center, an operations platform, and a site‑building center.

KubeVelaOAMService Management
0 likes · 13 min read
Service Management and Resource Abstraction in Cloud‑Native Environments Using OAM and KubeVela
Didi Tech
Didi Tech
Sep 5, 2023 · Operations

Observability and Stability Engineering in Didi Ride‑Hailing Platform

At Didi, observability and stability engineering combine automated, AI‑driven alarm generation, distributed tracing, and ChatOps‑based fault handling to manage micro‑service complexity, massive traffic spikes, and cross‑region operations, emphasizing systematic investment, AIOps evolution, and a recruitment call for backend and test engineers.

Didiaiopsdistributed-systems
0 likes · 16 min read
Observability and Stability Engineering in Didi Ride‑Hailing Platform
Didi Tech
Didi Tech
Aug 31, 2023 · Big Data

Data Stability Construction and Fault Governance Practices at Didi Customer Service

Didi’s multi‑year data‑stability program for its customer‑service platform progressed through fault‑centered engineering, business‑aligned cross‑team work, and capability normalization, instituting pre‑, mid‑ and post‑fault safeguards, clear ownership, automated alerts and repair tools, which cut fault count by 42 % and more than doubled mean‑time‑to‑repair while boosting team communication and satisfaction.

AutomationData ReliabilityData Warehouse
0 likes · 16 min read
Data Stability Construction and Fault Governance Practices at Didi Customer Service