NewSQL vs Middleware Sharding: Which Architecture Truly Wins?
This article objectively compares NewSQL databases with middleware‑based sharding solutions, examining architecture, distributed transactions, CAP constraints, high availability, scaling, SQL support, storage engines, and maturity to help readers choose the right approach for their workloads.
When discussing database scaling, the choice between NewSQL databases and middleware‑based sharding (traditional relational databases with middleware) is often debated. This article aims to objectively compare the two approaches by analyzing their core characteristics, implementation principles, advantages, disadvantages, and suitable scenarios.
What Makes NewSQL Databases Advanced?
According to the classification in the paper "pavlo‑newsql‑sigmodrec," systems like Spanner, TiDB, and OceanBase belong to the first new‑architecture type of NewSQL, while middleware solutions such as Sharding‑Sphere, Mycat, and DRDS belong to the second type.
Middleware + sharding is indeed a distributed architecture because storage is distributed and horizontal scaling is possible, but it can be considered a "pseudo" distributed database due to redundant SQL parsing and execution‑plan generation in both middleware and the underlying DB.
NewSQL databases differ from middleware‑based sharding in several key ways, illustrated by the following diagram:
Traditional databases are disk‑oriented, while NewSQL databases efficiently use in‑memory storage and concurrency control.
Middleware repeats SQL parsing and optimization, leading to lower efficiency.
NewSQL distributed transactions are optimized beyond XA, offering higher performance.
NewSQL stores data using Paxos or Raft multi‑replica protocols, providing true high availability and zero data loss (RTO < 30 s, RPO = 0).
NewSQL natively supports automatic sharding, migration, and scaling without requiring application‑level sharding keys.
Distributed Transactions
CAP Limitation
NewSQL databases, such as Google Spanner, claim CA behavior with extremely high availability, but this is achieved by minimizing network partitions through private global networks and robust operations teams.
Completeness
Two‑phase commit (2PC) does not guarantee strict ACID under all failure scenarios; recovery mechanisms can eventually restore consistency, but temporary anomalies may occur.
Many NewSQL products still have incomplete distributed‑transaction support, and their performance depends on the underlying implementation.
Performance
Traditional databases use XA 2PC, which incurs high network overhead and blocking time, making it unsuitable for high‑concurrency OLTP workloads.
NewSQL often adopts optimized 2PC models (e.g., Google Percolator) that combine atomic clocks, MVCC, and snapshot isolation, reducing lock contention and improving throughput.
SI (optimistic locking) may cause many commit failures under hotspot workloads, and its isolation level differs from repeatable read.
Despite optimizations, the additional overhead of GID acquisition, network latency, and prepare‑log persistence still impacts performance, especially with many nodes.
HA and Multi‑Region Active‑Active
Traditional master‑slave replication suffers from data loss in semi‑synchronous modes. Modern solutions based on Paxos or Raft (e.g., Spanner, TiDB, OceanBase) provide automatic leader election, high reliability, and fast failover.
While Paxos‑based multi‑replica can be applied to MySQL (e.g., MySQL Group Cluster), true active‑active across distant regions faces latency challenges; network delays of tens of milliseconds make strict strong consistency impractical for many OLTP scenarios.
Scalability and Sharding Mechanism
NewSQL databases embed automatic sharding, dynamically splitting hot regions and migrating data without application awareness. In contrast, middleware‑based sharding requires explicit design of sharding keys, routing rules, and manual scaling procedures.
Online scaling for sharding‑plus‑middleware is possible via asynchronous replication and read‑only routing switches, but it demands coordinated middleware and database actions.
Uniform built‑in sharding strategies may not align with domain models, leading to distributed transactions for certain business patterns (e.g., banking core systems).
Distributed SQL Support
NewSQL databases aim for full MySQL/PostgreSQL compatibility, supporting cross‑shard joins, aggregations, and complex queries through cost‑based optimization (CBO). Middleware solutions often rely on rule‑based optimization (RBO) and lack comprehensive cross‑shard capabilities.
Storage Engine
Traditional engines use B‑Tree structures optimized for disk access, while NewSQL often adopts LSM trees, converting random writes into sequential writes for higher write throughput, at the cost of slower reads.
Maturity and Ecosystem
NewSQL is still evolving, with strong adoption in internet companies but limited penetration in highly regulated industries (e.g., banking). Traditional relational databases benefit from decades of stability, extensive tooling, and broad talent pools.
Conclusion
Choosing between NewSQL and middleware‑based sharding depends on factors such as the necessity of strong‑consistent distributed transactions, data growth predictability, scaling frequency, throughput vs. latency priorities, application transparency requirements, and the availability of skilled DBAs.
If most of these considerations align with NewSQL strengths, adopting a NewSQL solution may be worthwhile despite higher learning costs. Otherwise, middleware‑based sharding remains a lower‑risk, cost‑effective choice that leverages existing relational database ecosystems.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.