Databases 22 min read

Consistency Models in Distributed Storage: Cosmos DB, Cassandra, OceanBase

This article explains the fundamentals of consistency in distributed storage systems, contrasts it with database transaction consistency, and details the various consistency levels offered by Azure Cosmos DB, Apache Cassandra, and OceanBase, highlighting their guarantees, configurations, and the performance‑availability trade‑offs involved.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Consistency Models in Distributed Storage: Cosmos DB, Cassandra, OceanBase

Distributed Storage System Model

In distributed storage systems (including distributed databases like OceanBase), the term consistency is often used, but its meaning varies across systems and people, leading to confusion.

Consider a simple storage system with a single client (single process) and a single server (single‑process service). The client issues read/write operations sequentially, and the server processes each request in order, so later operations always see the results of earlier ones.

When the system becomes more complex with a single service process (single replica) but multiple concurrent clients, operations can interfere: a client may read data written by another client. This is similar to multi‑threaded programs sharing memory.

Adding another service process on a different machine to store a copy of the data creates multiple replicas . A client may send each operation to either replica. If replica synchronization is delayed, a read may miss a recent write, or a later read may miss an earlier write.

In real systems, many clients concurrently read and write multiple replicas. If their operations target the same data item (e.g., the same file range or the same database row), they affect each other. For example, should client B see the latest write from client A immediately?

Further, each service process may manage only a subset of the total data (partitioned tables). A write that touches multiple partitions may be applied to different replicas, leading to complex read/write anomalies.

From the above, a generic distributed storage system has the following characteristics:

Data is split into multiple shards stored on many service nodes.

Each shard has multiple replicas on different nodes.

Many clients concurrently perform read/write operations, each taking variable time.

Unless otherwise noted, operations are atomic.

Difference from Database Transaction Consistency

The consistency in the ACID definition of a database transaction is different. In ACID, consistency means that a transaction preserves global constraints (e.g., uniqueness, foreign keys). It has nothing to do with whether the data is replicated.

In distributed storage, consistency refers to the agreement among multiple replicas. Transaction isolation, which limits the visibility of concurrent operations, is also unrelated to replica consistency.

Client‑View Consistency Model

To keep replicas consistent, a system must ensure:

All replicas eventually receive every write operation (no matter how long it takes).

All replicas execute writes in a deterministic order.

From the client’s perspective, four guarantees are defined:

Monotonic reads : once a client has read version n, any later read must return version ≥ n.

Read your writes : after a client writes version n, subsequent reads must see at least version n.

Monotonic writes : a client’s writes are observed by all replicas in the same order they were issued.

Read‑after‑write : after reading version n, any subsequent write must be performed on a replica that is at least at version n.

Different consistency levels offered by a system provide some subset of these guarantees, trading off latency, availability, and scalability.

Cosmos DB Consistency Levels

Azure Cosmos DB, a globally distributed NoSQL service, offers five configurable consistency levels (from strongest to weakest):

Strong consistency : reads always return the latest version (linearizable). Writes must be committed to a majority of replicas before succeeding; reads require majority acknowledgments.

Bounded staleness : reads are at most K versions behind the latest write, using a sliding window to guarantee global order.

Session consistency : within a session, guarantees monotonic reads, monotonic writes, and read‑your‑writes; no guarantees across sessions.

Prefix consistency : if no new writes occur, all replicas eventually converge; reads never see out‑of‑order writes (e.g., they may see A, A‑B, or A‑B‑C but never A‑C).

Eventual consistency : the weakest guarantee; replicas eventually converge, but reads may return older data.

Approximately 73% of Cosmos DB users choose session consistency, while 20% choose bounded staleness.

Cassandra Consistency Levels

Cassandra uses a quorum‑based protocol. By configuring the number of replicas that must acknowledge reads and writes, different consistency levels are achieved. Writes are performed via a PUT (upsert) operation, which is idempotent and has overwrite semantics.

Write configurations

ALL: write must be persisted to all replicas.

EACH_QUORUM: write must reach a quorum in each data‑center.

QUORUM: write must reach a majority of replicas.

LOCAL_QUORUM: write must reach a majority within the local data‑center.

ONE: write must reach at least one replica.

TWO: write must reach at least two replicas.

THREE: write must reach at least three replicas.

LOCAL_ONE: write must reach at least one replica in the local data‑center.

Read configurations

ALL: read returns only after all replicas respond.

QUORUM: read returns after a majority respond.

LOCAL_QUORUM: read returns after a majority in the local data‑center respond.

ONE: read returns after the nearest replica responds (may be stale).

TWO, THREE, LOCAL_ONE: similar to the write variants, requiring two, three, or one local replica responses.

When the sum of write and read quorum sizes exceeds the total replica count, Cassandra provides strong consistency; otherwise it provides eventual consistency.

OceanBase Consistency Levels

OceanBase uses a Multi‑Paxos consensus algorithm. Transactions are committed only after a majority of replicas acknowledge, and a leader replica executes all reads and writes to provide strong consistency. If the leader fails, another majority replica becomes the new leader, ensuring no data loss.

OceanBase offers four consistency levels:

Strong consistency : reads and writes go to the leader; a majority of replicas must confirm writes.

Bounded staleness : reads may be served by any replica as long as the data is no older than a configured time window (e.g., 30 seconds).

Prefix consistency : within a client session, reads are monotonic, guaranteeing that a client never sees out‑of‑order writes.

Eventual consistency : reads may be served by any replica without waiting for log replay; data may be stale but availability and latency are maximized.

Weak consistency levels relax read semantics while still requiring writes to go through the leader, thus preserving monotonic writes and read‑after‑write guarantees, but not read‑your‑writes.

References

Consistency model – Wikipedia: https://en.wikipedia.org/wiki/Consistency_model

Tunable data consistency levels in Azure Cosmos DB: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

Configuring data consistency in Apache Cassandra: https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Cassandra lightweight transactions documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databasesconsistency modelsOceanBasecassandracosmos db
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.