Databases 19 min read

NewSQL vs Middleware Sharding: Which Architecture Truly Wins?

This article objectively compares NewSQL databases with middleware‑based sharding solutions, examining architecture, distributed transactions, CAP constraints, high availability, scaling, SQL support, storage engines, and maturity to help readers choose the right approach for their workloads.

macrozheng

Aug 23, 2024

NewSQL vs Middleware Sharding: Which Architecture Truly Wins?

When discussing database scaling, the choice between NewSQL databases and middleware‑based sharding (traditional relational databases with middleware) is often debated. This article aims to objectively compare the two approaches by analyzing their core characteristics, implementation principles, advantages, disadvantages, and suitable scenarios.

What Makes NewSQL Databases Advanced?

According to the classification in the paper "pavlo‑newsql‑sigmodrec," systems like Spanner, TiDB, and OceanBase belong to the first new‑architecture type of NewSQL, while middleware solutions such as Sharding‑Sphere, Mycat, and DRDS belong to the second type.

Middleware + sharding is indeed a distributed architecture because storage is distributed and horizontal scaling is possible, but it can be considered a "pseudo" distributed database due to redundant SQL parsing and execution‑plan generation in both middleware and the underlying DB.

NewSQL databases differ from middleware‑based sharding in several key ways, illustrated by the following diagram:

Traditional databases are disk‑oriented, while NewSQL databases efficiently use in‑memory storage and concurrency control.

Middleware repeats SQL parsing and optimization, leading to lower efficiency.

NewSQL distributed transactions are optimized beyond XA, offering higher performance.

NewSQL stores data using Paxos or Raft multi‑replica protocols, providing true high availability and zero data loss (RTO < 30 s, RPO = 0).

NewSQL natively supports automatic sharding, migration, and scaling without requiring application‑level sharding keys.

Distributed Transactions

CAP Limitation

NewSQL databases, such as Google Spanner, claim CA behavior with extremely high availability, but this is achieved by minimizing network partitions through private global networks and robust operations teams.

Completeness

Two‑phase commit (2PC) does not guarantee strict ACID under all failure scenarios; recovery mechanisms can eventually restore consistency, but temporary anomalies may occur.

Many NewSQL products still have incomplete distributed‑transaction support, and their performance depends on the underlying implementation.

Performance

Traditional databases use XA 2PC, which incurs high network overhead and blocking time, making it unsuitable for high‑concurrency OLTP workloads.

NewSQL often adopts optimized 2PC models (e.g., Google Percolator) that combine atomic clocks, MVCC, and snapshot isolation, reducing lock contention and improving throughput.

SI (optimistic locking) may cause many commit failures under hotspot workloads, and its isolation level differs from repeatable read.

Despite optimizations, the additional overhead of GID acquisition, network latency, and prepare‑log persistence still impacts performance, especially with many nodes.

HA and Multi‑Region Active‑Active

Traditional master‑slave replication suffers from data loss in semi‑synchronous modes. Modern solutions based on Paxos or Raft (e.g., Spanner, TiDB, OceanBase) provide automatic leader election, high reliability, and fast failover.

While Paxos‑based multi‑replica can be applied to MySQL (e.g., MySQL Group Cluster), true active‑active across distant regions faces latency challenges; network delays of tens of milliseconds make strict strong consistency impractical for many OLTP scenarios.

Scalability and Sharding Mechanism

NewSQL databases embed automatic sharding, dynamically splitting hot regions and migrating data without application awareness. In contrast, middleware‑based sharding requires explicit design of sharding keys, routing rules, and manual scaling procedures.

Online scaling for sharding‑plus‑middleware is possible via asynchronous replication and read‑only routing switches, but it demands coordinated middleware and database actions.

Uniform built‑in sharding strategies may not align with domain models, leading to distributed transactions for certain business patterns (e.g., banking core systems).

Distributed SQL Support

NewSQL databases aim for full MySQL/PostgreSQL compatibility, supporting cross‑shard joins, aggregations, and complex queries through cost‑based optimization (CBO). Middleware solutions often rely on rule‑based optimization (RBO) and lack comprehensive cross‑shard capabilities.

Storage Engine

Traditional engines use B‑Tree structures optimized for disk access, while NewSQL often adopts LSM trees, converting random writes into sequential writes for higher write throughput, at the cost of slower reads.

Maturity and Ecosystem

NewSQL is still evolving, with strong adoption in internet companies but limited penetration in highly regulated industries (e.g., banking). Traditional relational databases benefit from decades of stability, extensive tooling, and broad talent pools.

Conclusion

Choosing between NewSQL and middleware‑based sharding depends on factors such as the necessity of strong‑consistent distributed transactions, data growth predictability, scaling frequency, throughput vs. latency priorities, application transparency requirements, and the availability of skilled DBAs.

If most of these considerations align with NewSQL strengths, adopting a NewSQL solution may be worthwhile despite higher learning costs. Otherwise, middleware‑based sharding remains a lower‑risk, cost‑effective choice that leverages existing relational database ecosystems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CAP theorem Sharding High Availability NewSQL distributed databases

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.