Databases 21 min read

Why NewSQL Matters: From Relational Roots to Modern HTAP

This article traces the evolution of database systems—from early navigational models and Codd's relational theory through the rise of NoSQL and sharding, to the resurgence of NewSQL and the emerging HTAP paradigm—highlighting technical motivations, design trade‑offs, and future directions.

ITPUB

Jan 27, 2018

Why NewSQL Matters: From Relational Roots to Modern HTAP

Evolution of Database Technologies

The first widely‑deployed databases were navigational systems such as IBM IMS, which stored records on magnetic tape and required applications to follow explicit pointers. In 1970 Edgar F. Codd published A Relational Model of Data for Large Shared Data Banks , introducing a mathematical foundation that eliminated ordering, index, and access‑path dependencies. This model enabled flexible queries and gave rise to modern relational DBMSs (Oracle, DB2, MySQL, PostgreSQL) that have dominated the field for four decades.

Sharding for Internet‑scale Applications

High‑concurrency web services often split a logical database across many physical nodes – a technique called sharding or horizontal partitioning. The simplest scheme hashes a chosen column and takes the remainder modulo the number of shards: hash(some_field_in_record) % N When the cluster grows (e.g., from 5 to 7 nodes) the modulo changes, forcing a costly data‑rebalancing operation that can cause prolonged downtime. Consistent‑hashing mitigates the movement of data but may introduce load‑skew, and sharding complicates multi‑row transactions and joins because each shard holds only a fragment of the data.

Sharding middleware works well for single‑record reads or updates, but executing multi‑record transactions or cross‑shard joins requires additional application logic.

NewSQL: Combining Scalability and ACID Guarantees

NewSQL systems aim to provide NoSQL‑level horizontal scalability while preserving the ACID guarantees of traditional relational databases. The ACID properties are:

Atomicity – all operations of a transaction succeed or none do.

Consistency – transaction results leave the database in a valid state according to defined constraints.

Isolation – concurrent transactions do not interfere with each other.

Durability – committed changes survive crashes.

Typical OLTP workloads that benefit from NewSQL have three characteristics:

Short‑lived transactions (no user‑visible stalls).

Access a small subset of rows via indexed lookups (no full table scans).

Repetitive query patterns with different parameters.

NewSQL implementations fall into three broad categories:

New‑architecture systems built from the ground up on a shared‑nothing, distributed design. Examples: ClustrixDB , CockroachDB , Google Spanner , MemSQL , NuoDB .

Sharding‑middleware layers that present a single logical RDBMS on top of existing databases. Examples: AgilData Scalable Cluster , MariaDB MaxScale , ScaleArc , ScaleBase .

Cloud‑native DBaaS offerings where the provider manages the infrastructure. Examples: Amazon Aurora , ClearDB .

All categories integrate techniques such as in‑memory storage, fine‑grained data partitioning, advanced concurrency control, secondary indexes, synchronous replication, and fault‑tolerant recovery.

Hybrid Transaction‑Analytical Processing (HTAP)

Traditional architectures separate OLTP (transactional) and OLAP (analytical) workloads, leading to data duplication, ETL latency, and higher storage costs. HTAP seeks to unify these workloads in a single engine, allowing fast inserts/updates and low‑latency analytics on the same data set.

Projects pursuing HTAP include Apache Kudu , which provides a columnar store optimized for both writes and scans. Several NewSQL vendors (CockroachDB, ClustrixDB, MemSQL) have roadmaps toward HTAP, although widespread production adoption remains limited.

Key Techniques Employed by Modern NewSQL Systems

In‑memory storage for sub‑millisecond latency.

Deterministic data partitioning (hash‑based or range‑based) to distribute load.

Optimistic or lock‑free concurrency control to reduce contention.

Secondary indexes that support ad‑hoc queries without full scans.

Multi‑region synchronous replication for high availability.

Automated fault recovery and node‑replacement mechanisms.

Illustrative SQL Examples

Creating a simple table and inserting rows:

CREATE TABLE score (id INT, name VARCHAR(50), level CHAR(1));
INSERT INTO score VALUES (12, 'John', 'B');
INSERT INTO score VALUES (19, 'Lily', 'A');

Querying for all students with grade ‘A’:

SELECT * FROM score WHERE level = 'A';

References

Codd, E.F. (1970). “A Relational Model of Data for Large Shared Data Banks”. Communications of the ACM.

Pavlo, A., Aslett, M. (2016). “What’s Really New with NewSQL?”. ACM SIGMOD Record.

Corbett, J. et al. (2012). “Spanner: Google’s Globally‑Distributed Database”. OSDI.

Bacon, D.F. et al. (2017). “Spanner: Becoming a SQL System”. SIGMOD.

ClustrixDB – https://www.clustrix.com/

CockroachDB – https://www.cockroachlabs.com/

Apache Kudu – https://kudu.apache.org/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL database HTAP NewSQL relational model

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.