MonetDB: History, Storage Model, Execution Model, Architecture, and Key Technologies
This article provides a comprehensive overview of MonetDB, covering its origins at CWI, column‑oriented storage with BATs, memory‑mapped and vectorized execution, three‑layer system architecture, cache‑aware optimizations such as vector operations and radix‑partitioned hash joins, as well as its limitations and reference sources.
1 History
MonetDB was developed by CWI, the research institute of the University of Amsterdam, which also created Python and VectorWise.
It originated in the 1990s from a data‑mining project that required an analytical database, initially called Data Distilleries.
The name MonetDB appeared in 2002; version 4 was released and open‑sourced on 30 September 2004, supporting the SQL:2003 standard.
In 2008, the MonetDB/X100 project introduced vector‑oriented CPU‑cache optimizations, later merged into VectorWise.
MonetDB 5 arrived in 2011, redesigning the low‑level API from MonetDB Instruction Language (MIL) to MonetDB Assembly Language (MAL).
Since 2015 the project is licensed under the Mozilla Public License 2.0.
2 Storage Model
MonetDB uses a column‑store where each column is a BAT (Binary Association Table) that stores an array of (OID, value) pairs; the OID is the row identifier and need not be physically stored. Fixed‑length types (integer, decimal, float) are stored as plain arrays, while variable‑length types (e.g., strings) use dictionary‑based BLOB storage.
Data is memory‑mapped, meaning the in‑memory structures mirror the on‑disk file layout. Queries employ late tuple reconstruction, materializing results only when they are sent to the client. The execution engine follows a volcano model with vectorized processing to improve CPU cache utilization.
3 Execution Model
The kernel is an abstract machine implemented in MonetDB Assembly Language (MAL). Each operator corresponds to a MAL instruction that works on two BATs; complex expressions are broken down into a sequence of simple instructions (bulk processing). The figure below shows a SELECT operator and its MAL implementation.
These simple loops generate high instruction locality, reduce cache misses, and are amenable to compiler and processor optimizations.
4 System Architecture
MonetDB’s query processing consists of three software layers:
Front‑end : maps user queries to BATs and translates the query language into MAL, applying strategic optimizations such as predicate push‑down and join indexing.
Back‑end : performs MAL optimizations and converts MAL commands into kernel calls; includes tactical optimizations like cost‑based optimizers.
Kernel : operates on BATs and executes the operators.
5 Key Technologies
5.1 Memory Model
CPU clock speeds have increased dramatically, but memory latency has not kept pace, creating the “memory wall”. Modern CPUs use multi‑level caches (L1, L2, L3) to bridge this gap. Typical access times are 1 ns for L1, 4 ns for L2, and 100 ns for DRAM, yielding 25‑100× speed differences.
5.2 Vector Operations
MonetDB processes data in vectors (chunks) rather than whole columns, keeping each chunk in CPU cache to minimise memory‑cache traffic. Benchmarks show optimal performance when vector sizes are around 1 000‑4 000; larger vectors cause cache thrashing and slower execution.
5.3 Cache‑Conscious Joins
MonetDB replaces the traditional Grace Hash‑Join with a radix‑partitioned hash‑join that aligns partition counts with cache line sizes, reducing cache thrashing. By using a power‑of‑two number of clusters that fits within cache lines, the join avoids excessive memory traffic.
6 Drawbacks
MonetDB relies heavily on memory‑mapped files; when data exceeds RAM and swapping occurs, performance degrades sharply.
Decimal type is limited to 18 digits, which may be insufficient for some applications.
7 References
1. https://en.wikipedia.org/wiki/MonetDB 2. https://stratos.seas.harvard.edu/files/MonetDebull2012.pdf 3. https://www.monetdb.org/AboutUs 4. https://ir.cwi.nl/pub/16497/16497B.pdf 5. http://www.gdmc.nl/projects/rgi-otb/geoinfoned/documents/RGI232-2008-01-p77-boncz.pdf 6. https://jacksondunstan.com/articles/3860 7. http://web.cecs.pdx.edu/~jrb/cs201/lectures/cache.friendly.code.pdf
Tencent Database Technology
Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.