What Goes Around: 20‑Year Evolution of Database Systems and Future Trends
This article reviews two decades of database research, analyzing the rise and decline of various data models—from hierarchical and relational to NoSQL, vector, and graph databases—while highlighting how AI, cloud, and hardware advances are reshaping DBMS architecture and predicting which approaches will dominate tomorrow’s data landscape.
1 Introduction
In 2005 Michael Stonebraker contributed a chapter titled “What Goes Around and Comes Around”, which examined major data‑model movements since the 1960s. The authors conclude that object‑relational models with extensible type systems have come to dominate the market, while many legacy non‑relational systems survive only as costly maintenance workloads.
Since that survey, DBMS have expanded far beyond traditional transaction processing to support big‑data workloads, machine‑learning integration, and cloud deployment.
2 Data Models and Query Languages
The paper classifies the past twenty years of activity into nine categories:
MapReduce systems
Key/Value stores
Document databases
Column‑family (wide‑column) stores
Text search engines
Array databases
Vector databases
Graph databases
Other emerging models
2.1 MapReduce Systems
Google built its MapReduce (MR) framework in 2003 to process web‑crawling data. In database terminology, Map is a user‑defined function that performs computation or filtering, while Reduce corresponds to a GROUP BY operation. A typical MR query looks like: SELECT map() FROM crawl_table GROUP BY reduce() Hadoop, an open‑source MR implementation, ran on the HDFS distributed file system. Over time, MR lost relevance as Hadoop vendors added SQL layers (e.g., Hive) and Google migrated its own pipelines to BigTable.
2.2 Key/Value Stores
Key/Value (KV) stores represent the simplest data model: a binary association (key, value). Early KV systems such as Memcached, Redis, Amazon Dynamo, and BerkeleyDB provided fast GET/SET/DELETE operations but required applications to manage schema and indexing. Modern KV engines (e.g., RocksDB, LevelDB) are often used as storage back‑ends for higher‑level DBMS.
2.3 Document Databases
Document databases store collections of JSON‑like records. Each document contains hierarchical field/value pairs. Example JSON document:
{ "name": "First Last", "orders": [ { "id": 123, "items": [...] }, { "id": 456, "items": [...] } ] }Originally popular for their schema‑less flexibility, many NoSQL document stores (e.g., MongoDB) have since added SQL interfaces and ACID support, narrowing the gap with relational systems.
2.4 Column‑Family Stores
Column‑family (wide‑column) stores such as Google BigTable, Apache Cassandra, and HBase model data as sparse rows with column families. Example mapping:
User1000 → { "name": "Alice", "accounts": [123,456], "email": "[email protected]" }
User1001 → { "name": "Bob", "email": ["[email protected]","[email protected]"] }These systems excel at high‑throughput writes but often lack joins and secondary indexes.
2.5 Text Search Engines
Text search engines (e.g., Elasticsearch, Solr) implement inverted indexes to support full‑text queries. While they provide limited transactional guarantees, most major RDBMS now embed full‑text search capabilities, reducing the need for separate engines.
2.6 Array Databases
Array databases store multi‑dimensional scientific data (e.g., rasdaman, kdb+, SciDB). They support efficient slicing and aggregation across arbitrary dimensions, a capability that traditional row‑oriented RDBMS lack.
2.7 Vector Databases
Vector databases specialize in similarity search over high‑dimensional embeddings generated by AI models. Example record format: (title, date, author, [embedding_vector]) Systems such as Pinecone, Milvus, and Weaviate build ANN indexes to accelerate nearest‑neighbor queries. Recent RDBMS releases (e.g., Oracle, ClickHouse, PostgreSQL extensions) have added native vector index support.
2.8 Graph Databases
Graph databases model data as nodes and edges, using either RDF triples or property graphs. They are used for social‑network analysis, knowledge graphs, and recommendation engines. Native graph queries (e.g., Cypher, PGQL, SQL/PGQ) are increasingly supported by relational systems, narrowing the performance gap.
3 Conclusions
The survey finds that most non‑SQL, non‑relational systems occupy niche markets or are being absorbed into SQL‑centric ecosystems. MapReduce is largely obsolete, KV stores are often embedded in modern RDBMS, document and column‑family stores are converging toward SQL interfaces, and emerging vector and graph databases are rapidly gaining native support in relational products. The authors anticipate continued convergence, with SQL‑based systems providing the broadest functionality while specialized engines persist only where they offer clear performance or domain advantages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
