Databases 18 min read

Inside Alibaba’s DRDS: How Distributed Databases Power Double‑11

This interview with Alibaba’s DRDS expert Shen Xun reveals how the distributed relational database service evolved from an internal tool to a cloud‑native solution, detailing its architecture, scalability advantages, middleware ecosystem, and real‑world performance during massive events like Double‑11.

ITPUB
ITPUB
ITPUB
Inside Alibaba’s DRDS: How Distributed Databases Power Double‑11

Evolution of DRDS from Internal Middleware to Cloud Service

DRDS (originally TDDL) was created in 2008 to provide horizontal sharding for the Taobao platform. It was designed for low cost, high performance, and strong reliability within Alibaba’s internal environment. With the emergence of Alibaba Cloud, the service was re‑engineered to expose a public API, adopt multi‑tenant isolation, and meet external customers’ SLA expectations.

Core Advantages of a Distributed Relational Database

Horizontal scalability : Adding new MySQL instances increases overall read/write capacity without changing application SQL.

Data safety : Replication across multiple data centers protects against site‑level failures.

High availability : Each node stores only a fragment of the data; a single node failure affects only a small portion of the workload.

Typical Cloud‑Native Workloads

Large‑scale internet services experience sudden traffic spikes and require sub‑millisecond latency for every request. DRDS’s sharding and distributed query engine allow e‑commerce platforms to sustain such workloads, especially during peak events like Alibaba’s Double‑11 shopping festival.

Double‑11 Scaling Procedure

Five days before Double‑11 an external DRDS customer requested capacity expansion. The following steps were performed:

Increase the logical shard count in the DRDS configuration.

Add new MySQL instances to the target shard group.

Execute drds‑admin reload‑topology to propagate the new topology.

Validate data redistribution using DRDS’s built‑in consistency check.

After these actions the system handled the traffic surge without service interruption, demonstrating on‑demand elasticity while preserving full SQL compatibility.

Middleware Ecosystem Supporting DRDS

HSF (High Speed Framework)

Java‑based RPC framework that makes remote method calls appear as local interface invocations, simplifying service‑oriented architecture. In the cloud it is offered as EDAS.

Name Server (Config Server)

Provides service discovery and role broadcasting. When a node assumes a role (e.g., master, replica), it announces its availability to dependent services.

ONS (RocketMQ)

Distributed messaging platform that guarantees eventual consistency, supports asynchronous processing, and can sustain >1 trillion messages per day during peak events. It decouples upstream transaction creation from downstream business logic.

DRDS (TDDL)

Implements horizontal sharding for relational databases while preserving familiar SQL features such as single‑machine transactions, joins, and cross‑node analytics. It inherits the Cobar network protocol and SQL parser, and the TDDL optimizer for query planning.

Message Middleware Design for High Throughput

During traffic spikes, messages inevitably accumulate. DRDS‑backed systems assume backlog and rely on a durable broker to buffer messages. A typical transaction flow:

Trade creation writes a record to the primary DB.

Trade service publishes a message to ONS.

Downstream services (logging, recharge, notification, etc.) consume the message in parallel.

This design reduces end‑to‑end latency and isolates failures: if any downstream consumer slows down or crashes, the main transaction remains unaffected.

Cold‑Data Handling

After peak periods, hot data is migrated to lower‑cost storage:

Analytical data is off‑loaded to Alibaba ODPS (big‑data platform).

Long‑term transaction logs are stored in inexpensive file‑engine storage.

This strategy balances performance for active workloads with cost‑effective retention for historical data.

Comparison with NoSQL Solutions

DRDS aims to combine NoSQL‑style scalability with full SQL compatibility. Unlike pure NoSQL systems that sacrifice relational features, DRDS retains:

Single‑machine ACID transactions.

Local and distributed JOIN operations.

Cross‑shard analytical queries.

Thus migration from a single‑node MySQL to DRDS incurs minimal application changes.

Operational Model Transition

Moving from an internal team to a cloud‑facing service required:

Redesign of the operations model to provide self‑service provisioning, automated scaling, and multi‑tenant isolation.

Removal of internal‑only tooling and replacement with cloud‑native APIs.

Implementation of safety mechanisms (rate limiting, back‑pressure) to protect against traffic that exceeds provisioned capacity.

These changes enabled DRDS to serve hundreds of external customers with the same reliability demonstrated in Alibaba’s internal double‑11 deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scalabilitymiddlewaredistributed databaseMessage QueueAlibaba CloudDRDS
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.