How Transparent Sharding Middleware Powers NewSQL: Core Functions Explained
This article explains the fundamentals of transparent sharding middleware in NewSQL systems, covering data partitioning strategies, read/write separation, the complete SQL processing pipeline, protocol adaptation for MySQL/PostgreSQL, distributed transaction models such as XA, Saga and TCC, and essential database governance and online scaling techniques.
Data Sharding
Traditional single‑node relational databases cannot meet the performance and availability demands of massive internet workloads, prompting the use of data sharding to distribute data across multiple databases or tables.
Vertical Sharding
Also called "vertical partitioning," this approach groups tables by business domain into separate databases, reducing load on each node.
Horizontal Sharding
Horizontal sharding distributes rows of a single logical table across multiple databases based on a sharding key, such as modulo of an ID.
Read/Write Separation
Separating read and write traffic into master and replica nodes reduces lock contention and improves query throughput, while multi‑master setups increase availability.
Core Process of Sharding Middleware
The middleware keeps applications unaware of sharding by handling five stages: SQL parsing, routing, rewrite, execution, and result merging, while adapting database protocols to maintain low integration cost.
SQL Parsing
SQL is first tokenized (lexical analysis) and then transformed into an abstract syntax tree (AST) that captures tables, columns, filters, pagination, and, for sharding middleware, placeholder markers.
Request Routing
Based on the AST and sharding strategy, the middleware generates a routing path. Queries with a sharding key may be routed to a single shard (equality), multiple shards (IN), a range of shards (BETWEEN), or broadcast if no key is present.
SQL Rewrite
For sharding middleware, the original SQL is rewritten to reference the physical tables and to embed pagination, sorting, or aggregation adjustments (e.g., converting AVG to SUM/COUNT).
Result Merging
After execution on each shard, results are merged. Simple queries use stream merging (row‑by‑row iteration), while complex aggregations may require loading all rows into memory for in‑memory merging.
Protocol Adaptation
To appear as a native relational database, the middleware implements MySQL and PostgreSQL wire protocols. For MySQL, a packet consists of payload length, sequence ID, and payload. The connection phase handles capability negotiation, SSL setup, and authentication. The command phase supports 32 command types, including COM_QUERY, COM_STMT_PREPARE, COM_STMT_EXECUTE, and their respective response packets (OkPacket, ErrPacket, Resultset rows).
Distributed Transactions
NewSQL must preserve ACID properties across shards. Two main families exist:
XA Protocol (Strong Consistency)
XA uses a two‑phase commit: a prepare phase where each resource confirms readiness, followed by a commit or rollback phase. It guarantees atomicity but can become a performance bottleneck for long‑running transactions.
Flexible Transactions (BASE)
Flexible models relax strict isolation:
Maximum Effort Delivery : retries failed writes until success; no rollback.
Saga : a series of local transactions with compensating actions; supports forward and backward recovery.
TCC (Try‑Confirm‑Cancel) : explicit Try, Confirm, and Cancel phases; provides atomicity while moving isolation to the business layer.
Message‑driven approaches achieve eventual consistency by persisting local changes and publishing events to downstream services.
Database Governance
Governance includes configuration centers, service registries, rate limiting, circuit breaking, failover, and tracing. Elastic scaling for cloud‑native databases involves online data migration with four steps: dual‑write, historical data migration, data source switch, and cleanup of obsolete shards.
Online Data Migration
Allows schema changes or sharding strategy updates without downtime, supporting both data expansion and online DDL operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
