Databases 16 min read

Alibaba's Massive Data Architecture: Sharding, Multi‑Data‑Center Synchronization, and Caching Strategies

The article describes how Alibaba scaled its massive data platform by horizontally sharding Oracle tables to MySQL, employing multi‑data‑center synchronization tools such as Erosa, Eromanga, and Otter, and designing multi‑level caching and proxy solutions like Cobar to ensure high availability and performance.

Architect
Architect
Architect
Alibaba's Massive Data Architecture: Sharding, Multi‑Data‑Center Synchronization, and Caching Strategies

Author Tao Yong, who joined Alibaba in 2005 and later led the distributed database team, explains the challenges of massive data growth across Alibaba's multiple sites (Chinese, international, Japanese) where core tables reach billions of rows, causing capacity, performance, and distribution issues.

Initially relying on Oracle, Alibaba recognized bottlenecks around 2007‑2008 and began splitting databases. Horizontal sharding moved core tables to MySQL clusters, alleviating CPU and I/O pressure while improving TPS and capacity.

The strategy involves two main actions: distributing load to transform a centralized architecture into a distributed one, and using varied storage solutions (Oracle, MySQL, KV stores, NoSQL) based on data characteristics.

Horizontal sharding is the primary method for B2B, moving large tables from Oracle to MySQL; vertical sharding is used as a transitional step. Multiple storage options are selected per business need, with Oracle for core data, MySQL for high‑volume tables, and KV stores for less relational data.

Three Alibaba products address these challenges: Erosa parses MySQL bin‑log in real time; Eromanga publishes the parsed changes for subscription by downstream systems; Otter synchronizes data across IDC sites, providing bidirectional, transaction‑level replication similar to SharePlex.

Data‑sync conflicts are handled by prioritizing one site’s data over another, with future plans for more sophisticated conflict‑resolution policies.

Caching is implemented in three layers—from front‑end image and page caches to back‑end local and remote (distributed) caches. Alibaba uses various cache engines (Berkeley DB, Memcached) and a unified front‑end wrapper to maximize hit rates and manage cache lifecycles.

The “拆迁大队” (demolition team) metaphor describes the team responsible for breaking large Oracle tables into MySQL shards using strategies such as field‑level, table‑level, and schema‑level splitting, often orchestrated through the proprietary MySQL proxy Cobar, which routes queries based on custom sharding rules.

Data migration follows a strict five‑step process, gradually moving reads/writes from Oracle to MySQL while ensuring minimal impact on users, typically performed during short maintenance windows.

Backend data warehousing (DW) leverages Oracle RAC, Greenplum, and Cobar for both offline and near‑real‑time analytics, with recent improvements reducing data latency from a day to five minutes thanks to Erosa.

Overall, Alibaba aims to build a unified platform that serves both OLTP and OLAP workloads, combining distributed database techniques, high‑performance sharding, and robust synchronization to meet the demands of massive, globally distributed traffic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Alibabacachingdatabase shardingdata synchronization
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.