Databases 6 min read

How YouPai.com Scaled Its Photo Database with Sharding and Replication

This article analyzes YouPai.com's evolution from a simple master‑slave setup to horizontal sharding using a mapping table, detailing migration steps, access flow, and challenges such as cross‑database joins, consistency, and ID generation.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How YouPai.com Scaled Its Photo Database with Sharding and Replication

Database Evolution

Initially the system used a single master‑slave pair, with the slave serving only backup and disaster recovery; when the master failed, the slave was manually promoted. As load grew, memcached was added, then the architecture moved to one master with multiple slaves, and finally to database sharding.

Sharding Strategies

Sharding can be performed vertically—splitting by functional modules with different table structures—or horizontally—splitting rows of the same table across multiple databases with identical schemas. Because photo data dominates the workload, YouPai.com adopted horizontal sharding.

Sharding Rules

Common approaches map a column value (range or hash) to a specific database, e.g., user IDs 0‑10000 to DB A, 10000‑20000 to DB B. This method is easy to implement but hampers scalability, as adding nodes requires algorithm changes or massive data movement.

YouPai.com instead uses a mapping table: an index table stores the relationship between each user ID and the database ID. The table is cached for fast lookup. When a new user registers, a database is chosen at random and the mapping is recorded.

Data Migration

To rebalance load across nodes, data migration follows these steps:

Mark the user as "migrating"; writes are blocked and a notice is shown.

Copy all of the user's data to the newly added node.

Update the mapping table with the new database ID.

Set the user's status back to normal.

Delete the user's data from the original database.

The migration is scheduled at night to minimize impact on users.

Data Access Process

Issues Introduced by Sharding

Cross‑database joins : Queries that need data from multiple databases cannot use a single JOIN; they require multiple queries and application‑level aggregation.

Consistency : There are no foreign‑key constraints or distributed transactions across shards, so atomicity must be handled manually, often with separate transactions on each database.

Auto‑increment IDs : A dedicated ID database with a single auto‑increment column generates unique IDs. Periodic cleanup of this table is needed to maintain performance.

Database Layout

The overall system consists of several sub‑databases. Each sub‑database is backed by two physical servers configured as master‑master replication, though only one server is active at a time; the idle server saves costs by hosting two sub‑databases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data Migrationdatabase shardinghorizontal partitioningmaster-slave replicationMapping Table
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.