Databases 21 min read

Sharding (Partitioning) vs NewSQL Databases: A Comparative Analysis

This article objectively compares traditional sharding‑plus‑middleware architectures with modern NewSQL distributed databases, examining their design principles, distributed transaction handling, high‑availability mechanisms, scaling and sharding strategies, SQL support, storage engines, and overall maturity to help practitioners choose the most suitable solution for their workloads.

Architecture Digest

Apr 19, 2020

Sharding (Partitioning) vs NewSQL Databases: A Comparative Analysis

Recently, during technical exchanges, I am often asked how to choose between sharding‑plus‑middleware and NewSQL distributed databases; many articles exist, but some opinions are biased. This article aims to objectively compare the two approaches by analyzing their key characteristics, implementation principles, advantages, disadvantages, and suitable scenarios.

What Makes NewSQL Databases Advanced?

According to the paper "pavlo‑newsql‑sigmodrec", NewSQL architectures can be classified into three types: first‑generation NewSQL (e.g., Spanner, TiDB, OceanBase), middleware‑based sharding solutions (e.g., Sharding‑Sphere, Mycat, DRDS), and cloud databases (not discussed here). While middleware‑based sharding does achieve distributed storage and horizontal scaling, it repeats SQL parsing and execution‑plan generation in both middleware and the underlying DB, and relies on B+Tree storage engines, which is less efficient.

Below is a simple architectural comparison (image omitted for brevity):

Traditional databases are disk‑oriented; NewSQL makes better use of memory‑based storage and concurrency control.

Middleware repeats SQL parsing and optimization, leading to lower efficiency.

NewSQL distributed transactions are optimized compared to XA, offering higher performance.

NewSQL stores data using Paxos or Raft multi‑replica protocols, providing true high‑availability and reliability (RTO < 30 s, RPO = 0).

NewSQL natively supports data sharding, automatic migration and scaling, reducing DBA workload and being transparent to applications.

The following sections discuss each point in detail.

Distributed Transactions

This is a double‑edged sword.

CAP Limitation

Early NoSQL systems omitted distributed transactions not because of theoretical limits but due to the CAP theorem: achieving strong consistency inevitably sacrifices either availability (A) or partition tolerance (P). NewSQL does not break CAP; for example, Google Spanner claims to be "practically CA" by operating on a private global network that makes partitions extremely rare.

Why do most NoSQL databases not provide distributed transactions?

Spanner achieves high consistency and >99.999% availability by relying on a private network and a highly skilled operations team, as described in "Spanner, TrueTime and the CAP theorem".

In distributed systems you can know where work is done or when it finishes, but not both; two‑phase commit is essentially an anti‑availability protocol.

Completeness

Two‑phase commit (2PC) does not guarantee strict ACID under all failure scenarios; recovery mechanisms can eventually restore consistency, but true atomicity and consistency may be temporarily violated. Many NewSQL products still have edge cases where distributed transaction support is incomplete.

Nevertheless, distributed transactions are a core mechanism for NewSQL; if performance or completeness suffers, higher‑level SQL correctness is affected.

Performance

Traditional databases also support XA transactions, but the high network overhead, blocking time, and deadlock risk make XA unsuitable for high‑concurrency OLTP. NewSQL often implements optimized 2PC variants, such as Google Percolator’s atomic‑clock + MVCC + Snapshot Isolation, which reduces locking and makes part of the commit asynchronous, improving performance over classic XA.

SI is optimistic locking; in hotspot scenarios it may cause many aborts, and its isolation level differs from Repeatable Read.

However, even optimized 2PC incurs extra GID acquisition, network latency, and log persistence, especially when many nodes participate, which can significantly reduce throughput in scenarios like batch bank transfers.

Spanner’s distributed transaction benchmark results (image omitted).

Although NewSQL vendors claim full support for distributed transactions, applications still need to minimize their use; strong‑consistency transactions are costly, and many systems adopt flexible (BASE) models such as Saga, TCC, or reliable messaging for high‑throughput workloads.

HA and Multi‑Active Geo‑Replication

Master‑slave replication is no longer optimal; even semi‑synchronous replication can lose data under extreme conditions. Modern solutions use Paxos or Raft multi‑replica protocols (e.g., Spanner, TiDB, CockroachDB, OceanBase) to achieve true high‑availability, automatic leader election, and fast failover. Some traditional databases (MySQL Group Replication) are also moving toward Paxos‑based designs.

Implementing a production‑grade consensus algorithm requires handling many failure scenarios, batching, and asynchronous I/O to reduce overhead.

Multi‑active geo‑replication is feasible only when inter‑region latency is low; otherwise, the required majority‑write acknowledgments make latency unacceptable for OLTP.

Scale (Horizontal Expansion) and Sharding Mechanism

Paxos solves availability but not scaling; therefore, sharding is essential. NewSQL databases embed automatic sharding, hotspot detection, and online split/merge/migration, making the process transparent to applications (e.g., TiDB’s region‑based auto‑migration at 64 MB).

In contrast, sharding‑plus‑middleware requires upfront design of shard keys, routing rules, and manual scaling procedures, increasing complexity for most systems.

Sharding can also be performed online via asynchronous replication and read‑only routing switches, but it needs coordinated middleware and database support.

Distributed SQL Support

Both approaches handle single‑shard SQL well. NewSQL, being a unified database, offers richer cross‑shard capabilities (joins, aggregations, complex queries) thanks to built‑in statistics and cost‑based optimization (CBO). Middleware solutions usually rely on rule‑based optimization (RBO) and often lack efficient cross‑shard joins.

Middleware reflects a compromise design focused on application needs, while NewSQL aims for a comprehensive, high‑performance engine.

Storage Engine

Traditional DBs use disk‑oriented B‑Tree engines, which excel at random reads but suffer from random writes. NewSQL often adopts LSM‑tree engines, turning random writes into sequential writes for higher write throughput, at the cost of more complex read‑merge operations. Additional techniques (SSD, bloom filters, caching) mitigate read penalties.

Maturity and Ecosystem

Distributed NewSQL databases are still evolving; they have been adopted mainly in internet companies and non‑core enterprise systems. Traditional relational databases have decades of stability, extensive tooling, and a large talent pool. Choice depends on business needs: fast‑growing internet services may favor NewSQL, while regulated industries (e.g., banking) may prefer proven RDBMS with middleware sharding.

Other features such as online DDL, migration tools, and operational utilities are omitted for brevity.

Conclusion

If you are still unsure, consider the following questions to evaluate whether NewSQL solves a real pain point:

Is strong‑consistent distributed transaction required at the database layer?

Is data growth unpredictable?

Does scaling frequency exceed operational capacity?

Is throughput more important than latency?

Must the solution be completely transparent to applications?

Do you have a DBA team familiar with NewSQL?

If two or three answers are "yes", NewSQL may be worth the learning cost and risk. Otherwise, sharding‑plus‑middleware remains a lower‑cost, lower‑risk option that leverages mature relational ecosystems while providing most needed functionality.

In the current stage where NewSQL is not fully mature, sharding offers a higher lower‑bound and a safer upper‑bound for traditional industries that treat the database as a black‑box product.

These viewpoints are personal and may evolve; discussion is welcomed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding High Availability Database Architecture NewSQL

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.