Why NewSQL? Classification, Technical Challenges, and Discussion
This article reviews the motivations behind NewSQL, outlines its three main classifications, examines the technical challenges such as memory storage, sharding, concurrency control, replication and crash recovery, and provides a discussion on its scalability and architectural trade‑offs.
Why NewSQL?
Database workloads grew dramatically with the rise of the Internet around 2000, creating scalability bottlenecks that traditional relational databases could not handle. Two primary scaling approaches emerged: vertical scaling (more powerful hardware) and horizontal scaling (sharding via middleware). Vertical scaling incurs high cost and downtime, while middleware‑based sharding struggles with distributed transactions and complex joins, prompting the need for a new solution.
NewSQL Definition and Benefits
NewSQL targets OLTP workloads, offering the scalability and performance of NoSQL while preserving ACID‑compliant transactions and the relational model. Its advantages include easy scalability and simplified application logic thanks to native transaction support.
NewSQL Classification
NewSQL systems fall into three categories:
New architectures built from the ground up.
Transparent sharding middleware.
Database‑as‑a‑Service (DBaaS) offerings.
New Architectures
Key characteristics of new‑architecture NewSQL include:
No shared storage.
Multi‑node concurrency control.
Replication for high availability and disaster recovery.
Traffic control.
Distributed query processing.
Advantages are distributed‑aware query optimization, data‑local query routing, and flexible multi‑replica storage. The main drawback is a shortage of skilled operators. Representative products: Google Spanner, CockroachDB.
Transparent Sharding Middleware
Middleware responsibilities:
Routing queries to appropriate shards.
Coordinating distributed transactions.
Managing data placement, replication, and partitioning.
Pros: applications need little or no changes. Cons: underlying nodes still run traditional disk‑based DBMSs, leading to inefficient use of large‑memory, multi‑core servers and duplicated query planning/optimization.
Representative products: MariaDB MaxScale, ScaleArc.
Database‑as‑a‑Service (DBaaS)
Features:
On‑demand usage.
Cloud‑native storage enabling easy scalability.
Representative products: Amazon Aurora, ClearDB.
Technical Challenges of NewSQL
Main Memory Storage
Traditional databases rely on disk‑centered designs with caching, but modern large‑memory servers require new memory management techniques beyond simple page caching.
Partitioning/Sharding
Data is partitioned by hash or range on selected columns, requiring the DBMS to execute SQL across partitions, merge results, support online node addition/removal, and enable online migration or replication of partitions.
Concurrency Control
Provides atomicity and isolation for ACID. Atomicity often uses variants of two‑phase commit (2PC) with either a central coordinator or a decentralized clock‑synchronised approach. For example, Spanner relies on hardware atomic clocks, while CockroachDB uses a hybrid clock combining loosely synchronised hardware clocks with logical counters.
Isolation Techniques
2PL (Two‑Phase Locking)
MVCC (Multiversion Concurrency Control)
OCC (Optimistic Concurrency Control)
Most NewSQL systems adopt MVCC (e.g., CockroachDB) or a combination of 2PL+MVCC (e.g., InnoDB, Spanner).
Secondary Indexes
Two implementation styles exist:
Local indexes: each partition holds a fragment of the index; updates affect a single node, but reads may span many nodes.
Global indexes: every partition stores the full index; updates require distributed transactions, but reads are single‑node.
Replication
Key concerns are consistency (using Paxos or 2PC across partitions) and the replication mode (synchronous command execution vs. asynchronous state sync).
Crash Recovery
Minimising downtime typically involves primary‑secondary failover and using checkpoints to accelerate the reintegration of newly added nodes.
Discussion
Scalability is a core NewSQL attribute, yet middleware‑based solutions face inherent scalability limits due to routing metadata. Additional challenges include multi‑tenant isolation, load balancing, and the lack of groundbreaking theoretical innovations; NewSQL mainly represents architectural integration of existing techniques to meet modern distributed system demands.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
