Databases 28 min read

How Transparent Sharding Middleware Powers NewSQL: Core Functions Explained

This article explains the fundamentals of transparent sharding middleware in NewSQL systems, covering data partitioning strategies, read/write separation, the complete SQL processing pipeline, protocol adaptation for MySQL/PostgreSQL, distributed transaction models such as XA, Saga and TCC, and essential database governance and online scaling techniques.

JD Cloud Developers

Mar 5, 2021

How Transparent Sharding Middleware Powers NewSQL: Core Functions Explained

Data Sharding

Traditional single‑node relational databases cannot meet the performance and availability demands of massive internet workloads, prompting the use of data sharding to distribute data across multiple databases or tables.

Vertical Sharding

Also called "vertical partitioning," this approach groups tables by business domain into separate databases, reducing load on each node.

Horizontal Sharding

Horizontal sharding distributes rows of a single logical table across multiple databases based on a sharding key, such as modulo of an ID.

Read/Write Separation

Separating read and write traffic into master and replica nodes reduces lock contention and improves query throughput, while multi‑master setups increase availability.

Core Process of Sharding Middleware

The middleware keeps applications unaware of sharding by handling five stages: SQL parsing, routing, rewrite, execution, and result merging, while adapting database protocols to maintain low integration cost.

SQL Parsing

SQL is first tokenized (lexical analysis) and then transformed into an abstract syntax tree (AST) that captures tables, columns, filters, pagination, and, for sharding middleware, placeholder markers.

Request Routing

Based on the AST and sharding strategy, the middleware generates a routing path. Queries with a sharding key may be routed to a single shard (equality), multiple shards (IN), a range of shards (BETWEEN), or broadcast if no key is present.

SQL Rewrite

For sharding middleware, the original SQL is rewritten to reference the physical tables and to embed pagination, sorting, or aggregation adjustments (e.g., converting AVG to SUM/COUNT).

Result Merging

After execution on each shard, results are merged. Simple queries use stream merging (row‑by‑row iteration), while complex aggregations may require loading all rows into memory for in‑memory merging.

Protocol Adaptation

To appear as a native relational database, the middleware implements MySQL and PostgreSQL wire protocols. For MySQL, a packet consists of payload length, sequence ID, and payload. The connection phase handles capability negotiation, SSL setup, and authentication. The command phase supports 32 command types, including COM_QUERY, COM_STMT_PREPARE, COM_STMT_EXECUTE, and their respective response packets (OkPacket, ErrPacket, Resultset rows).

Distributed Transactions

NewSQL must preserve ACID properties across shards. Two main families exist:

XA Protocol (Strong Consistency)

XA uses a two‑phase commit: a prepare phase where each resource confirms readiness, followed by a commit or rollback phase. It guarantees atomicity but can become a performance bottleneck for long‑running transactions.

Flexible Transactions (BASE)

Flexible models relax strict isolation:

Maximum Effort Delivery : retries failed writes until success; no rollback.

Saga : a series of local transactions with compensating actions; supports forward and backward recovery.

TCC (Try‑Confirm‑Cancel) : explicit Try, Confirm, and Cancel phases; provides atomicity while moving isolation to the business layer.

Message‑driven approaches achieve eventual consistency by persisting local changes and publishing events to downstream services.

Database Governance

Governance includes configuration centers, service registries, rate limiting, circuit breaking, failover, and tracing. Elastic scaling for cloud‑native databases involves online data migration with four steps: dual‑write, historical data migration, data source switch, and cleanup of obsolete shards.

Online Data Migration

Allows schema changes or sharding strategy updates without downtime, supporting both data expansion and online DDL operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability Sharding Database Middleware NewSQL distributed transactions Database Governance SQL Protocol

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.