Databases 12 min read

MatrixOne Storage Format Design Overview

This article provides a comprehensive overview of MatrixOne's hyper‑converged cloud‑native database architecture, detailing its three‑layer design, data execution flow, columnar storage format, metadata hierarchy, performance optimizations, compatibility mechanisms, and practical usage scenarios.

DataFunTalk

Mar 17, 2024

MatrixOne Storage Format Design Overview

MatrixOne is a future‑oriented hyper‑converged heterogeneous cloud‑native database management system that uses a unified distributed engine to support OLTP, OLAP, and streaming workloads, and can be seamlessly deployed on public clouds, private data centers, and edge nodes, offering simplicity, low cost, high performance, and scalability.

The system is organized into three layers: a compute layer of multiple nodes responsible for data extraction and computation; a middle layer handling transaction processing, metadata, and a shared log service; and an interface layer that connects to various storage backends (e.g., S3, local files, NAS) via a File Service.

When a user inserts data, the request is routed to the transaction layer, written to the shared log, and after accumulating enough rows (typically thousands), the data is flushed to disk as a block; this process is described as the data execution flow.

Key considerations include buffering small writes in memory before flushing, continuously merging data blocks to support analytical queries, and handling large writes by directly persisting to the database or S3 and then publishing through a service.

The storage format consists of a Header (recording version, format, metadata location, and checksum), a Data region (containing columnar blocks), and a Footer that mirrors the Header. Each block stores column data, indexes, Bloom filters, and other auxiliary structures.

Metadata is organized hierarchically: the Header identifies the object type and version, the Data region holds blocks of column data, and SubMeta stores catalog, block, and object information. Checkpoint and replay mechanisms, along with metadata and index caches, enable fast access and efficient recovery.

Each block has its own meta (ColumnMeta) describing column ID, type, NDV, null count, data extent, checksum, and zone map. A block index records offsets so that reads can locate the required blocks without scanning the entire file.

Performance is ensured through in‑memory byte‑level representations, pointer‑based parsing that achieves nanosecond‑level deserialization, and block‑ID based copy operations that simplify data writes.

Compatibility is achieved by embedding Type and Version in the IOEntryHeader, allowing the system to register appropriate encode/decode functions for different versions. Migration tools such as MoDump enable loading of legacy datasets, and the system supports various use cases through checkpoint/replay and catalog‑driven queries.

The Q&A section clarifies that catalog updates are handled via incremental checkpoints and that MoDump can be used to import existing data into MatrixOne, demonstrating the system's practical applicability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Storage Engine Metadata distributed database compatibility MatrixOne

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.