Databases 25 min read

Understanding Cassandra’s Row‑Oriented Storage, Write Path, and Consistency

This article explains Cassandra’s row‑oriented storage model, the multi‑step write and read processes, how tombstones and compaction manage data growth, and the impact of its distributed architecture on high availability, fault tolerance, and configurable consistency levels.

dbaplus Community

Feb 4, 2020

Row‑Oriented Storage

Cassandra stores data in a sparse matrix rather than a strict row‑oriented or column‑oriented format; each row must contain a primary key but can include any subset of columns, allowing dynamic column presence that traditional row stores lack.

Because rows are keyed by a Partition Key and locally ordered by a Clustering Key, queries that target the primary key can be answered without additional sorting, similar to how traditional OLTP databases use primary‑key indexes.

Write Path

When a write request arrives, Cassandra follows five main steps:

Append the operation to the Commit Log (similar to a REDO log) which is first written to memory and later flushed to disk based on commitlog_sync settings.

Add the data to an in‑memory memtable, avoiding index lookups, disk I/O, and locking.

If the row is cached, invalidate the corresponding entry in the Row Cache.

Asynchronously flush the memtable to disk, creating an SSTable.

Handle node‑failure scenarios through replication and hinted handoff.

This append‑only design, combined with per‑row timestamps, eliminates read‑before‑write and lock contention, delivering high write throughput. Conditional statements such as IF NOT EXISTS or lightweight transactions reintroduce read steps and reduce performance.

Read Path

Reading a row on a single node proceeds through several stages:

If the row is present in the Row Cache, return it immediately.

Check the Key Cache for offsets that locate the row in memtables or SSTables.

Search the active memtable (linear scan) for the most recent version.

Read the relevant SSTable(s) from disk, applying Bloom filters and index summaries to avoid unnecessary I/O.

Merge results, update the Row Cache if appropriate, and return the latest version based on timestamps.

Compaction, Tombstones, and Data Growth

Deletes are implemented as tombstones—markers that indicate a row is logically removed. Updates are treated as inserts with newer timestamps, causing data to accumulate. Compaction runs asynchronously to purge expired tombstones, discard older timestamped rows, and rewrite SSTables, preventing unbounded disk growth and improving read performance.

Distributed Architecture, High Availability, and Consistency

Cassandra’s data is partitioned by the Partition Key, enabling true distribution, decentralization, and horizontal scalability; adding nodes simply expands capacity.

Replication creates multiple primary copies of data, so node failures do not require failover; the system continues serving reads and writes from remaining replicas.

Consistency is configurable per operation. Levels such as ANY, ONE, QUORUM, and ALL determine how many replicas must acknowledge a write, while read levels (ONE, QUORUM, etc.) decide how many replicas are consulted. Choosing lower consistency improves latency but may return stale data; higher levels guarantee freshness at the cost of latency.

SSTable File Components

An SSTable consists of several files that together enable fast reads:

Filter.db – Bloom filter indicating possible key presence.

Summary.db – Sampled index for quick lookups.

Index.db – Offsets to rows in Data.db.

CompressionInfo.db – Metadata for any compression applied to Data.db.

Data.db – The actual stored rows.

Digest.adler32 – Checksum for data integrity.

Statistics.db – Statistics used by nodetool tablehistograms.

TOC.txt – Table of contents listing the component files.

In summary, the article covers Cassandra’s row‑oriented storage, its efficient write‑only path, read flow, data‑growth management via tombstones and compaction, and how its distributed design provides high availability, fault tolerance, and tunable consistency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Compaction Database Architecture Read Path Write Path Cassandra Consistency Levels

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.