Databases 30 min read

What Goes Around: 20‑Year Evolution of Database Systems and Future Trends

This article reviews two decades of database research, analyzing the rise and decline of various data models—from hierarchical and relational to NoSQL, vector, and graph databases—while highlighting how AI, cloud, and hardware advances are reshaping DBMS architecture and predicting which approaches will dominate tomorrow’s data landscape.

21CTO

Jul 30, 2024

What Goes Around: 20‑Year Evolution of Database Systems and Future Trends

1 Introduction

In 2005 Michael Stonebraker contributed a chapter titled “What Goes Around and Comes Around”, which examined major data‑model movements since the 1960s. The authors conclude that object‑relational models with extensible type systems have come to dominate the market, while many legacy non‑relational systems survive only as costly maintenance workloads.

Since that survey, DBMS have expanded far beyond traditional transaction processing to support big‑data workloads, machine‑learning integration, and cloud deployment.

2 Data Models and Query Languages

The paper classifies the past twenty years of activity into nine categories:

MapReduce systems

Key/Value stores

Document databases

Column‑family (wide‑column) stores

Text search engines

Array databases

Vector databases

Graph databases

Other emerging models

2.1 MapReduce Systems

Google built its MapReduce (MR) framework in 2003 to process web‑crawling data. In database terminology, Map is a user‑defined function that performs computation or filtering, while Reduce corresponds to a GROUP BY operation. A typical MR query looks like: SELECT map() FROM crawl_table GROUP BY reduce() Hadoop, an open‑source MR implementation, ran on the HDFS distributed file system. Over time, MR lost relevance as Hadoop vendors added SQL layers (e.g., Hive) and Google migrated its own pipelines to BigTable.

2.2 Key/Value Stores

Key/Value (KV) stores represent the simplest data model: a binary association (key, value). Early KV systems such as Memcached, Redis, Amazon Dynamo, and BerkeleyDB provided fast GET/SET/DELETE operations but required applications to manage schema and indexing. Modern KV engines (e.g., RocksDB, LevelDB) are often used as storage back‑ends for higher‑level DBMS.

2.3 Document Databases

Document databases store collections of JSON‑like records. Each document contains hierarchical field/value pairs. Example JSON document:

{ "name": "First Last", "orders": [ { "id": 123, "items": [...] }, { "id": 456, "items": [...] } ] }

Originally popular for their schema‑less flexibility, many NoSQL document stores (e.g., MongoDB) have since added SQL interfaces and ACID support, narrowing the gap with relational systems.

2.4 Column‑Family Stores

Column‑family (wide‑column) stores such as Google BigTable, Apache Cassandra, and HBase model data as sparse rows with column families. Example mapping:

User1000 → { "name": "Alice", "accounts": [123,456], "email": "[email protected]" }
User1001 → { "name": "Bob", "email": ["[email protected]","[email protected]"] }

These systems excel at high‑throughput writes but often lack joins and secondary indexes.

2.5 Text Search Engines

Text search engines (e.g., Elasticsearch, Solr) implement inverted indexes to support full‑text queries. While they provide limited transactional guarantees, most major RDBMS now embed full‑text search capabilities, reducing the need for separate engines.

2.6 Array Databases

Array databases store multi‑dimensional scientific data (e.g., rasdaman, kdb+, SciDB). They support efficient slicing and aggregation across arbitrary dimensions, a capability that traditional row‑oriented RDBMS lack.

2.7 Vector Databases

Vector databases specialize in similarity search over high‑dimensional embeddings generated by AI models. Example record format: (title, date, author, [embedding_vector]) Systems such as Pinecone, Milvus, and Weaviate build ANN indexes to accelerate nearest‑neighbor queries. Recent RDBMS releases (e.g., Oracle, ClickHouse, PostgreSQL extensions) have added native vector index support.

2.8 Graph Databases

Graph databases model data as nodes and edges, using either RDF triples or property graphs. They are used for social‑network analysis, knowledge graphs, and recommendation engines. Native graph queries (e.g., Cypher, PGQL, SQL/PGQ) are increasingly supported by relational systems, narrowing the performance gap.

3 Conclusions

The survey finds that most non‑SQL, non‑relational systems occupy niche markets or are being absorbed into SQL‑centric ecosystems. MapReduce is largely obsolete, KV stores are often embedded in modern RDBMS, document and column‑family stores are converging toward SQL interfaces, and emerging vector and graph databases are rapidly gaining native support in relational products. The authors anticipate continued convergence, with SQL‑based systems providing the broadest functionality while specialized engines persist only where they offer clear performance or domain advantages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL data modeling vector search NoSQL DBMS Evolution

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.