Databases 23 min read

Choosing Between NewSQL Databases and Middleware‑Based Sharding: Advantages, Trade‑offs and Practical Guidance

The article objectively compares NewSQL distributed databases with middleware‑plus‑sharding solutions, covering architectural differences, distributed transaction handling, high‑availability, scaling, SQL support, storage engines, maturity, and provides a decision‑making checklist to help engineers select the most suitable approach for their workloads.

Top Architect

Mar 13, 2025

Choosing Between NewSQL Databases and Middleware‑Based Sharding: Advantages, Trade‑offs and Practical Guidance

Recently, while exchanging ideas with peers, I am often asked how to choose between sharding‑plus‑middleware and NewSQL distributed databases. Many online articles are either overly promotional or overly critical, so this piece aims to present an objective, neutral comparison of the two paradigms.

This article compares the key characteristics and implementation principles of the two models to clarify their real strengths, weaknesses, and suitable scenarios.

What Makes NewSQL Databases Advanced?

First, regarding whether “middleware + relational DB sharding” counts as NewSQL, a SIGMOD paper (pavlo‑newsql‑sigmodrec) classifies Spanner, TiDB, and OceanBase as the first generation of NewSQL architectures. Middleware solutions such as Sharding‑Sphere, Mycat, and DRDS are considered the second generation, while cloud‑native databases form a third generation (not covered here).

Even though the storage is distributed, the middleware‑based approach still performs SQL parsing and plan generation both in the middleware and the underlying DB, which leads to duplicated work.

Is it a “pseudo” distributed database?

From an architectural standpoint, the answer is partly yes. The middleware layer repeats SQL parsing, execution‑plan generation, and uses traditional B+Tree storage engines, which are less efficient than the native designs of NewSQL systems.

Below is a simple architecture comparison diagram:

Traditional databases are disk‑oriented; NewSQL databases are memory‑oriented and make more efficient use of resources.

Middleware repeats SQL parsing and optimization, resulting in lower efficiency.

NewSQL distributed transactions are optimized beyond classic XA, offering higher performance.

NewSQL stores data using Paxos or Raft multi‑replica protocols, providing true high‑availability (RTO < 30 s, RPO = 0).

NewSQL automatically handles sharding, migration and scaling, making DBA work easier and keeping the application transparent.

These points are often highlighted by NewSQL vendors, but are they really as perfect as advertised? The following sections discuss each claim in detail.

Distributed Transactions

This is a double‑edged sword.

CAP Limitation

NoSQL databases historically avoided distributed transactions because of the CAP theorem: guaranteeing strong consistency inevitably sacrifices availability or partition tolerance. NewSQL does not break CAP; Google Spanner, for example, claims “practically CA” by operating on a private global network that virtually eliminates network partitions.

Completeness

Two‑phase commit (2PC) can provide ACID, but in the commit phase failures and network delays can cause visibility issues. Some NewSQL products still have incomplete transaction support in edge cases.

Performance

Traditional relational databases also support XA, but the high network overhead of 2PC makes it unsuitable for high‑throughput OLTP. NewSQL often implements transaction models based on Google Percolator, using a Timestamp Oracle (TSO), MVCC, and Snapshot Isolation (SI). The model replaces part of the commit with asynchronous primary/secondary locks, improving performance compared with classic XA.

SI is optimistic; in hot‑spot scenarios it may cause many aborts, and its isolation level differs from Repeatable Read.

Nevertheless, the extra GID acquisition, network round‑trips, and log persistence in 2PC still cause noticeable performance loss, especially when many nodes are involved (e.g., batch deductions in banking).

NewSQL products often advise keeping distributed transactions to a minimum and preferring “soft” transactions (BASE) such as Saga, TCC, or reliable messaging for large‑scale OLTP.

HA and Multi‑Region Active‑Active

Traditional master‑slave replication (even semi‑sync) can lose data under extreme conditions. Modern solutions based on Paxos or Raft—such as Google Spanner, TiDB, CockroachDB, OceanBase—provide multi‑replica storage with majority‑write rules, automatic leader election, and fast failover.

Even MySQL now offers Group Replication, which may eventually replace classic master‑slave setups.

Implementing true multi‑region active‑active requires low network latency; otherwise the added commit latency makes it unsuitable for high‑frequency OLTP.

Scale and Sharding Mechanism

Paxos solves availability but not horizontal scaling, so sharding is mandatory. NewSQL databases embed automatic sharding, monitor per‑shard load, and split/merge regions transparently (e.g., TiDB splits a region when it reaches 64 MiB).

In contrast, middleware‑based sharding forces developers to design split keys, routing rules, and scaling procedures manually, which adds considerable complexity.

Online scaling with middleware is possible via asynchronous replication and read‑only phases, but it requires tight coordination between middleware and the DB.

Distributed SQL Support

Both models support single‑shard queries, but NewSQL offers full cross‑shard joins, aggregations, and a cost‑based optimizer (CBO) that leverages global statistics. Middleware typically relies on rule‑based optimization (RBO) and may not support cross‑shard joins efficiently.

NewSQL generally does not support stored procedures, views, or foreign keys, while middleware can pass those through to the underlying relational DB.

Storage Engine

Traditional engines use B+Tree structures optimized for disk I/O; they excel at reads but suffer from random‑write overhead. NewSQL often adopts LSM‑tree storage, turning random writes into sequential writes, which boosts write throughput at the cost of slightly slower reads. Additional techniques (SSD caching, bloom filters) mitigate the read penalty.

Maturity and Ecosystem

Evaluating a distributed database requires a multi‑dimensional model: development status, community activity, monitoring tools, feature completeness, DBA talent pool, SQL compatibility, performance and HA testing, online DDL, etc. NewSQL products have matured rapidly in internet‑scale scenarios but still lag behind decades‑old relational databases in overall stability, tooling, and talent availability.

For internet companies facing explosive data growth, NewSQL’s automatic scaling and reduced operational overhead are attractive. For regulated industries (e.g., banking) that prioritize proven reliability and have existing DBA expertise, middleware‑based sharding remains a safer, lower‑risk choice.

Conclusion & Decision Checklist

If you are still unsure which model to adopt, consider the following questions:

Do you need strong‑consistent transactions at the database layer?

Is data growth unpredictable?

Does your team struggle with frequent scaling operations?

Do you prioritize throughput over low latency?

Must the solution be completely transparent to the application?

Do you have DBAs experienced with NewSQL?

If two or three answers are “yes”, a NewSQL database may be worth the learning curve. Otherwise, a well‑designed middleware‑plus‑sharding architecture offers a lower‑cost, lower‑risk path while still meeting most OLTP requirements.

Both approaches have trade‑offs; there is no silver bullet. Choose the one that aligns with your business constraints, technical expertise, and performance goals.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability High Availability NewSQL distributed databases Transaction Management

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.