Why Multi-Model Databases Are the Future of Cloud Data Management
The article explains how cloud-driven demands and diverse data types have spurred the rise of multi-model databases, detailing their architecture, storage structures, compression techniques, and access methods using SequoiaDB as a concrete example.
1. Cloud‑driven demand for Multi‑Model databases
Modern cloud‑native applications generate structured (relational), semi‑structured (JSON/XML) and unstructured (images, video, documents) data. Maintaining dozens of separate database services in a dbPaaS increases operational overhead and data‑consistency risk. A Multi‑Model database that natively supports all data types on a single platform reduces complexity and cost.
2. Multi‑Model storage engine architecture
Two architectural patterns address heterogeneous data:
Polyglot Persistence – deploy multiple specialized databases side‑by‑side. Each workload gets optimal performance, but the system incurs higher deployment, monitoring and schema‑management complexity.
Multi‑Model database – embed several storage engines inside a single distributed database, exposing a unified API and metadata layer. This simplifies development, deployment and backup.
SequoiaDB follows the second pattern: a single distributed engine hosts relational tables, JSON documents, object data and full‑text indexes simultaneously.
3. Storage data structures
SequoiaDB stores all data as BSON (Binary JSON) documents. BSON retains JSON’s hierarchical model while adding binary types (Date, BinData, etc.) and a compact binary layout that enables fast traversal and schema‑less storage.
Physical storage is organized as files → pages → extents . Logical containers are:
Collection Space – a group of files that isolates a set of collections.
Collection – a logical container for BSON documents, analogous to a table.
Document – a single BSON record stored inside a collection.
Each collection consists of a linked list of extents; an extent is a linked list of pages. When a collection exhausts its current extent, the engine allocates a new extent and links it, allowing continuous growth without pre‑allocation.
3.1 Structured and semi‑structured data
Structured data (fixed schema) and semi‑structured data (self‑describing JSON/XML) coexist in the same collection. Because BSON is schema‑less, fields can be added, removed or changed on a per‑document basis without schema migrations.
3.2 Unstructured data (LOB)
Large objects (LOBs) such as images, videos or PDFs are managed by a dedicated LOB subsystem. When a LOB is written, the engine:
Assigns a globally unique OID.
Splits the binary payload into fixed‑size shards (default 512 KB).
Hashes each shard (OID + sequence) to select a target partition group.
Stores shard metadata in a LOBM file and the raw shard data in a LOBD file.
Reading a LOB requires fetching the OID, locating the shard with sequence=0 (which holds the LOB’s size, creation time, etc.), then retrieving all subsequent shards in order and reassembling them.
3.3 Data access and compression
SQL interface – SequoiaDB implements PostgreSQL/MySQL‑compatible protocols, allowing existing SQL applications to connect without code changes.
Native APIs – Drivers for C, C++, Java, Python, Go, Node.js and other languages provide direct collection and document operations.
Compression – Row‑level compression uses Snappy (dictionary‑free, fast) while table‑level compression uses LZW (dictionary‑based) to reduce storage footprint and improve I/O throughput.
SequoiaFS – A POSIX‑style file system built on FUSE maps LOB collections to a virtual directory hierarchy, enabling standard file operations (open, read, write, delete) on distributed LOB data.
4. Summary
Multi‑Model databases such as SequoiaDB provide a unified storage engine that can manage relational, JSON, object and full‑text data together, while offering SQL compatibility, language‑native APIs, built‑in compression and a FUSE‑based file system for unstructured data. This architecture aligns with cloud‑native requirements for scalability, operational simplicity and cost efficiency, and reflects the broader industry trend of extending traditional relational systems with native JSON and LOB support.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
