Databases 21 min read

NewSQL vs Sharding: Which Database Architecture Truly Wins?

This article objectively compares NewSQL distributed databases with middleware‑based sharding solutions, examining their architectural differences, distributed transaction handling, high‑availability mechanisms, scaling and sharding strategies, SQL support, storage engines, and overall maturity to help readers decide which approach best fits their workload.

Programmer DD

Jan 12, 2021

NewSQL vs Sharding: Which Database Architecture Truly Wins?

What makes NewSQL databases advanced?

First, regarding whether "middleware + relational database sharding" counts as a NewSQL distributed database, a SIGMOD paper classifies Spanner, TiDB, and OceanBase as the first new‑architecture type, while Sharding‑Sphere, Mycat, DRDS and similar middleware belong to the second type. Although middleware‑based sharding distributes storage and can achieve horizontal scaling, it duplicates SQL parsing and execution‑plan generation and relies on B+Tree storage engines, which is inefficient. For clarity, this article treats NewSQL as the new‑architecture databases.

NewSQL databases are advanced over middleware‑based sharding in several ways:

Traditional databases are disk‑oriented; NewSQL utilizes memory‑centric storage management and concurrency control for higher efficiency.

Middleware repeats SQL parsing and plan optimization, reducing overall efficiency.

NewSQL’s distributed transactions are optimized compared to XA, offering better performance.

NewSQL stores data using multi‑replica Paxos (or Raft) protocols, providing true high availability and reliability (RTO < 30 s, RPO = 0).

NewSQL natively supports data sharding, automating migration and scaling without requiring applications to specify sharding keys.

These points are often highlighted by NewSQL vendors, but are they truly as beneficial as advertised? The following sections discuss each point in detail.

Distributed Transactions

This is a double‑edged sword.

CAP Limitation

Early NoSQL systems avoided distributed transactions not because of theoretical limits but due to the CAP theorem: guaranteeing strong consistency inevitably sacrifices either availability or partition tolerance. NewSQL does not break CAP; Google Spanner, the archetype for many NewSQL systems, claims to be "practically CA" because its private global network makes network partitions extremely unlikely.

In distributed systems you can know where work is happening or when it finishes, but you cannot know both simultaneously; two‑phase commit is essentially an anti‑availability protocol.

Completeness

Two‑phase commit (2PC) does not strictly guarantee ACID under all failure scenarios; recovery mechanisms can eventually restore atomicity and consistency, but there are edge cases. Many NewSQL products still have incomplete distributed‑transaction support, as evidenced by real‑world cases where they fail.

Because distributed transactions are a core mechanism for cross‑shard DML/DDL, any performance or completeness shortcomings can severely affect correctness.

Performance

Traditional databases also support distributed transactions via XA, but XA incurs high network overhead, long blocking times, and deadlocks, making it unsuitable for high‑concurrency OLTP. NewSQL typically uses optimized 2PC variants, such as Google Percolator’s atomic‑clock + MVCC + Snapshot Isolation, which reduces locking and offloads part of the commit asynchronously, improving performance over XA.

Snapshot Isolation is optimistic; under hotspot workloads it may cause many commit failures, and its isolation level differs from true Repeatable Read.

Nevertheless, the extra steps in 2PC (global ID acquisition, network round‑trips, prepare‑log persistence) still impose noticeable performance penalties, especially when many nodes are involved.

Although NewSQL products advertise full support for distributed transactions, applications still need to minimize their use.

Given the high cost of strong‑consistency transactions, many teams prefer flexible (BASE) approaches such as Saga, TCC, or reliable messaging for large‑scale, high‑throughput OLTP.

High Availability and Multi‑Region Active‑Active

Traditional master‑slave replication, even semi‑synchronous, can lose data under extreme conditions. Modern solutions adopt Paxos or Raft multi‑replica protocols (e.g., Spanner, TiDB, CockroachDB, OceanBase), providing automatic leader election, majority‑write guarantees, and rapid failover, reducing operational overhead.

Implementing a production‑grade consensus algorithm requires careful engineering, batching, and asynchronous optimizations.

While Paxos‑based designs can theoretically support active‑active across regions, they demand low inter‑region latency; high latency (tens of milliseconds) makes true active‑active impractical for OLTP workloads.

Some companies achieve multi‑region availability at the application layer by dual‑writing transaction data via message queues and using distributed caches to mask latency, but this approach still involves trade‑offs.

Scale and Sharding Mechanism

Paxos solves availability but not horizontal scaling; therefore built‑in sharding is essential. NewSQL databases embed automatic sharding, hotspot detection, and region splitting (e.g., TiDB’s region migration at 64 MiB). In contrast, middleware‑based sharding requires developers to pre‑define sharding keys, strategies, routing rules, and scaling procedures, adding significant complexity.

Sharding can be performed online via asynchronous replication and read‑only phases, but it relies on coordinated middleware and database support.

Uniform sharding strategies may not align with domain models, leading to cross‑shard transactions that degrade performance (e.g., banking workloads where customer, account, and transaction tables are naturally co‑located).

Distributed SQL Support

Both approaches handle single‑shard queries well. NewSQL, being a general‑purpose distributed database, offers richer SQL support, including cross‑shard joins, aggregations, and cost‑based optimization (CBO) thanks to global statistics. Middleware solutions often rely on rule‑based optimization (RBO) and may lack efficient cross‑shard query execution.

NewSQL typically supports MySQL or PostgreSQL protocols, limiting compatibility to those dialects, while middleware can proxy many relational engines, preserving stored procedures, views, and foreign keys when operating on a single node.

Middleware reflects a compromise design focused on application compatibility, whereas NewSQL aims for a comprehensive, high‑performance backend.

Storage Engine

Traditional engines are disk‑oriented B+Tree structures, optimized for random reads but suffering from random writes. NewSQL often adopts LSM‑tree storage, turning random writes into sequential writes for higher write throughput, at the cost of more complex reads that are mitigated with SSDs, caches, and bloom filters.

Although NewSQL incurs overhead from replication and distributed transactions, its elastic clustering yields higher overall QPS compared to single‑node databases.

Maturity and Ecosystem

Evaluating distributed databases requires multidimensional testing: development status, community, monitoring tools, feature coverage, DBA talent, SQL compatibility, performance, HA, online scaling, distributed transactions, isolation levels, online DDL, etc. NewSQL has matured in internet‑scale scenarios but remains less proven in high‑risk domains like banking, where traditional RDBMS still dominate due to decades of stability and ecosystem.

For fast‑growing internet companies, NewSQL’s reduced need for manual sharding and its higher throughput are attractive. For risk‑averse enterprises, middleware‑based sharding offers lower technical barriers and leverages existing RDBMS ecosystems.

Due to space limits, topics such as online DDL, data migration, and operational tooling are omitted.

Conclusion

Before choosing a solution, consider the following questions:

Is strong consistency required at the database layer?

Is data growth unpredictable?

Does scaling frequency exceed operational capacity?

Is throughput more important than latency?

Must the solution be transparent to the application?

Do you have a DBA team familiar with NewSQL?

If two or three of these are affirmative, NewSQL may be worth exploring despite its learning curve. Otherwise, traditional sharding with middleware remains a lower‑risk, lower‑cost option, especially for legacy or highly regulated industries.

In the current stage where NewSQL is not yet fully mature, sharding provides a higher floor and lower ceiling, making it a safe choice for many core systems.

Software selection ultimately depends on domain characteristics and architect preferences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

transaction sharding high availability NewSQL distributed databases

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.