Databases 12 min read

How to Design and Test Horizontal Sharding for Large-Scale Databases

This article outlines the background, analysis, design, testing strategy, migration plan, and production quality results for implementing horizontal sharding, table splitting, and Elasticsearch integration in a growing enterprise database system.

Qunhe Technology Quality Tech

Apr 2, 2022

How to Design and Test Horizontal Sharding for Large-Scale Databases

1. Background

As the company grows, user volume and business complexity increase, requiring each business team to migrate relevant tables from a unified database to their own databases for easier maintenance.

With the development of the middle‑platform, data volume grows, a single‑DB architecture leads to slow responses and timeouts, prompting sharding and Elasticsearch integration.

2. Analysis and Design

2.1 Horizontal Sharding

Reason: Table data will keep growing, slowing queries, so horizontal splitting based on primary‑key ID is needed.

Problem: Multiple tables share a single auto‑increment ID.

Solution: Use tddl‑sequence; each node obtains a distinct ID segment, guaranteeing no primary‑key conflict.

Risk considerations: Verify that tddl‑sequence fully prevents conflicts by studying conflict cases.

Investigation points:

Manual inserts causing primary‑key conflicts.

Modification of the sequence value leading to conflicts.

Multi‑threaded code causing conflicts.

2.2 Number of Shards and Sharding Fields

Tables x and y are split horizontally. Table x (1,000 万 rows, 2 万 daily growth) suggests 8 shards, usable for 4.1 years. Table y (5,000 万 rows, 5 万 daily growth) suggests 32 shards, usable for 5.5 years.

Calculation: One shard stores up to 5 million rows. Example for x: 8 × 5 million = 40 million – 10 million (existing) = 30 million / 2 = 15 000 days ≈ 4.1 years.

After sharding, fields remain unchanged.

Risk: Define important fields such as primary keys as BIGINT in both DB and ES.

2.3 Sharding Strategy

Since only one database exists after migration, sharding strategy is based on actual shard count.

Table x: 8 shards, strategy “primary‑key % 8”. Table y: 32 shards, strategy “primary‑key % 32”.

Risk: With non‑business‑specific IDs as shard keys, data skew is unlikely; however, strong business IDs may need additional evaluation.

2.4 Elasticsearch Integration

Advantages: Massive query capability.

Disadvantages: Higher latency (≈10× slower) and lower stability compared to DB.

Plan: Use ES for y‑table queries that involve many fields; keep x‑table queries in DB, using ES only for full‑table scans.

Risk: No separate data store for ES, so no fallback; only low‑frequency, tolerant queries are routed to ES.

3. Sharding Test Plan

3.1 Overall Quality Control

Service layer: Ensure all service interfaces function correctly, possibly using traffic replay.

Data layer: Guarantee data consistency during migration and after enabling dual‑write.

Business layer: Validate end‑to‑end flows with both internal and external traffic.

Performance: Assess performance at each switch point.

3.2 Key Test Details

Data consistency: Compare row counts between old and new databases; verify dual‑write consistency.

Interface consistency: Use traffic replay, manual verification, and the QueryCheckResult switch to compare records.

Business correctness: After data and interface validation, deploy code with dual‑write, read, and check switches enabled, then run main business flows.

Performance testing: Conduct baseline pressure tests, capacity planning, and subsequent tests after cache compression, GC changes, and adding read replicas.

4. Switch Migration Plan

Step 1: Configure switches for sequence, search limit, summerCache, write mode, and read source.

Step 2: Enable sequence and dual‑write.

Step 3: Stop data‑source sync, monitor counts via DBV to ensure stability.

Step 4: Enable ES read switch, routing reads to new DB and ES.

Step 5: After all consumers switch to the new DB, stop writes to the public DB.

5. Overall Migration Architecture

Key stages illustrated with diagrams: switch primary‑key sequence, enable dual‑write, switch reads to new DB, fully migrate reads, and finally stop writes to the public DB.

6. Production Quality

Application stability high; few alerts or errors.

Core interface latency reduced by over 50% (e.g., interface a – 60%, b – 63%, c – 58%, d – 59%, e – 54%, f – 61%, g – 46%).

Capacity planning nearly eliminated dependence on the public DB, reducing risk.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Dual Write migration performance testing database sharding Horizontal Partitioning es integration

Written by

Qunhe Technology Quality Tech

Kujiale Technology Quality

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.