Databases 20 min read

How Memory Databases Handle Concurrency, Persistence, and Query Processing

This article explores the concurrency control strategies, persistence mechanisms, and compiled query processing techniques used by modern memory databases, comparing systems like Hekaton, Hyper, HANA, and others while highlighting performance trade‑offs and architectural considerations.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
How Memory Databases Handle Concurrency, Persistence, and Query Processing

Database Management Systems: Concurrency Control

1. Two Strategies for Concurrency Control in Memory Databases

Memory databases mainly use two categories of concurrency control: multi‑version concurrency control (MVCC) and partition‑based processing. The mechanisms can be optimistic or pessimistic. Pessimistic control assumes constant contention and locks resources before access; optimistic control assumes rare conflicts and validates at commit, rolling back on conflict.

a. Multi‑Version Concurrency Control

Optimistic MVCC does not rely on locks and usually creates a new version for each update. Readers select an appropriate older version, so reads do not block writes, achieving high concurrency. The downside is extra overhead for version creation and garbage collection of old versions. Memory databases often adopt optimistic MVCC because it incurs lower overhead than lock‑based pessimistic control, though high write contention can increase abort rates.

b. Partition‑Based Processing

The second strategy splits the database into multiple partitions, each processed serially. A single partition incurs no extra concurrency overhead, but cross‑partition transactions cause a sharp drop in throughput, making performance unpredictable when workloads cannot be confined to a single partition.

2. MVCC in Hekaton

Hekaton uses optimistic MVCC. When a transaction starts, the system assigns a read timestamp and marks the transaction as active. During execution it records accessed data. In the pre‑commit phase it obtains an end timestamp, validates that the read and scan data are still visible, writes a new version to the log, and commits. Post‑processing records the version timestamp before the transaction fully ends.

a. Hekaton Transaction Validation

i) Read Stability

Hekaton guarantees read stability: the version of each record read at transaction start remains visible at commit.

ii) Phantom Avoidance

Phantom rows occur when a transaction’s repeated query yields different results because other transactions modified the data set. Hekaton avoids phantoms by rescanning and checking for newer versions, ensuring the data seen at start matches that at end.

Read‑only transactions need no validation under MVCC. For update transactions, validation depends on the isolation level: snapshot isolation requires none, repeatable read requires read stability, and serializable requires both read stability and phantom avoidance.

b. Hekaton Reclamation Strategy

Reclamation is performed by each transaction rather than a dedicated thread. When a transaction ends, its write set is examined; if all active transactions have start timestamps greater than the ended transaction’s end timestamp, the old versions can be reclaimed by the ending transaction.

3. MVCC in Hyper

Hyper differs from Hekaton in three ways: (1) updates are applied directly at the record location with an undo buffer linking modifications; (2) validation compares the latest update with the read record; (3) commits are serialized and ordered.

Hyper records read predicates (including range scans) in the log and uses the undo buffer to track inserted, deleted, and updated records, simplifying change detection.

For each transaction, the system checks whether any other transaction modified the data set between start and commit to detect phantoms; if found, the transaction aborts.

4. MVCC in HANA and HStore/VoltDB

HANA uses a simple pessimistic MVCC with row‑level locks protected by timestamps. Each transaction must acquire a write lock for updates or deletes, and deadlock detection is performed.

HStore/VoltDB is a partitioned system where each partition holds a coarse‑grained lock. Single‑partition transactions are fast, but multi‑partition transactions suffer from low efficiency and are sensitive to load skew.

5. Load‑aware Scheduling (Calvin)

If a workload’s read/write sets are known in advance, transaction order can be predetermined. Calvin implements a very lightweight locking (VLL) prototype based on this assumption. While most workloads cannot predict sets, stored‑procedure‑driven applications can benefit from Calvin‑like designs.

Persistence Techniques in Memory Databases

Like disk‑based DBMSs, memory databases require logging and checkpoints. Checkpoints enable recovery from the latest point without replaying the entire log, but they involve disk I/O and thus impact performance.

Key differences: memory‑DB logs and checkpoints may omit index information, reconstructing indexes on recovery; checkpoints write the entire data set to disk, not just dirty pages, resulting in larger I/O volume.

Hekaton Checkpoint

Performance‑optimized persistence in Hekaton ensures high‑throughput, low‑latency logging and fast recovery. Logs record only data updates, not index changes. During checkpoint, Hekaton partitions the key range (e.g., 100‑199, 200‑299, 300‑399) and writes data and deleted records separately. Recovery replays logs in parallel, rebuilding indexes and filtering out deleted records.

Other Systems’ Checkpoints

1. Systems using logical logging (e.g., H‑Store/VoltDB) record executed operations instead of raw data changes, reducing log size. They employ copy‑on‑write during logging.

2. Some systems periodically write snapshots to disk without indexes, as Hyper does using OS‑level snapshot functionality.

Query Processing in Memory DBMS

Traditional volcano‑model query processing uses generic operators that parse byte arrays at runtime, which is costly in memory databases where type information could be known at compile time.

Additionally, repeated get‑next calls and virtual function dispatch introduce overhead. Therefore, compiled query execution is preferred.

Memory databases typically parse, optimize, and generate executable code for a query, then load and execute the compiled machine code directly.

Performance measurements show that compiled execution dramatically reduces execution time after the first run.

Hekaton’s Compiled Query Processing

Hekaton compiles T‑SQL stored procedures into an intermediate MAT generator, which produces C code compiled into a library. Unlike generic operators that interpret types at runtime, Hekaton’s compiled code knows table definitions, yielding a 3‑4× performance boost.

HyPer and MemSQL Compilation

HyPer splits the query tree at pipeline boundaries and compiles each segment. MemSQL uses LLVM to compile its MPI language into native code.

Benchmarks show MemSQL’s compiled execution reduces query time from 0.05 s (repeated runs) to 0.02 s after the first compiled run.

Other Memory Database Systems

Additional notable memory DBMS include:

SolidDB : a hybrid disk‑memory engine from 1992, using variable‑length trie indexes and pessimistic locking with snapshot checkpoints.

Oracle TimesTen : originated as HP’s Smallbase, now an in‑memory acceleration engine for larger RDBMS, employing row‑level latching and WAL‑based persistence.

Altibase : Korean‑origin system with page‑level checkpoints, MVCC, and latch‑free page handling.

P*Time : part of SAP HANA, uses differential encoding for logs and supports larger‑than‑memory deployments.

Summary

Each DBMS is designed for specific hardware constraints. Disk‑based systems contend with CPU, memory, and disk bottlenecks, whereas memory databases eliminate disk I/O but face challenges such as lock/latch overhead, cache‑line misses, pointer chasing, predicate evaluation, data movement, and network costs in distributed setups.

We have compared traditional disk‑based DBMS and memory databases across data organization, indexing, concurrency control, compiled query processing, and persistence, highlighting the unique trade‑offs and optimization techniques of in‑memory systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory databaseConcurrency Controldatabase systemsMVCCCheckpointquery compilation
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.