Databases 17 min read

Understanding Database Isolation Levels and Dirty Reads

This article explains the concepts of isolation levels, dirty reads, and phantom reads across various relational and NoSQL databases, describing how each system implements or lacks transaction isolation and the practical impact on data consistency and performance.

Architects Research Society
Architects Research Society
Architects Research Society
Understanding Database Isolation Levels and Dirty Reads

Key Points

Knowing a database's supported isolation levels is essential beyond just ACID or non‑ACID classification.

Some databases marketed as "eventually consistent" may return results that are inconsistent at any point in time.

Many databases offer isolation levels higher than you may require.

Dirty reads can cause you to see two versions of the same record or miss a record entirely.

Repeated queries within a single transaction can produce phantom rows.

When developer David Glasser discovered MongoDB's default handling of dirty reads, MongoDB again made headlines on Reddit. This article explains what isolation levels and dirty reads are and how they are realized in popular databases.

ANSI SQL defines four standard isolation levels: Serializable, Repeatable Read, Read Committed, and Read Uncommitted.

Most databases default to "Read Committed", which guarantees that you will not see intermediate data while a transaction is in progress. It does this by briefly acquiring a lock during the read while keeping write locks until the transaction commits.

If you need to repeat the same read operation multiple times within a transaction and want to be sure it always returns the same value, you must keep the read lock for the entire duration. Using the "Repeatable Read" isolation level automatically does this.

We call "Repeatable Read" "reasonably safe" because phantom reads can still occur. When a query with a WHERE clause (e.g., WHERE Status = 1 ) is executed, rows matching the condition are locked, but nothing prevents new rows that satisfy the condition from being inserted, leading to phantom rows on the second execution.

To absolutely guarantee that two reads in the same transaction return identical data, you can use the Serializable isolation level, which employs range locks to prevent insertion of new rows that match the WHERE clause of the open transaction.

Higher isolation levels generally degrade performance due to increased lock contention. To improve read performance, some databases also support "Read Uncommitted", which ignores locks (known as NOLOCK in SQL Server) and thus permits dirty reads.

Dirty Read Issues

Before discussing dirty reads, understand that a table is only a logical construct; the actual data resides in one or more indexes. In most relational databases the primary index is called a "clustered index" or a "heap". Updating a row requires modifications to each index that references the changed columns, often involving a delete‑then‑insert operation.

Dirty reads can manifest in several ways:

Uncommitted reads are the simplest case. By using a SELECT with the "Read Uncommitted" hint, you can see rows that have been inserted or updated but not yet committed; if the transaction later rolls back, the SELECT has returned data that never existed logically.

During an update, two reads may occur. If you read all customers in California, and an update moves a customer from California to Texas between your reads, you may see the same customer twice—once with the old value and once with the new value.

Similarly, a "lost read" can happen. If a customer is moved from Texas to Alaska and you query by state, you might miss the record entirely, which is what happened in the MongoDB example described by David Glasser.

Depending on the database’s design and execution plan, dirty reads can also interfere with ordering. For example, if the engine collects pointers to rows, updates a row, and then copies data from the original location using those pointers, the result may be inconsistent.

Snapshot Isolation or Row‑Level Versioning

To provide good performance while avoiding dirty reads, many databases support snapshot isolation semantics. Under snapshot isolation, a transaction cannot see the results of any other transaction that started after it.

This is achieved by creating a temporary copy of the rows to be modified instead of relying solely on locks, a technique commonly called "row‑level versioning".

When you request "Read Committed" isolation, most databases that support snapshot semantics automatically use it.

Isolation Levels in SQL Server

SQL Server supports all four ANSI SQL isolation levels plus an explicit snapshot level. Depending on how the READ_COMMITTED_SNAPSHOT option is configured, "Read Committed" can also use snapshot semantics.

Before enabling this option, test thoroughly; it can improve read performance but may slow writes, especially if tempdb resides on a slow drive because tempdb stores the old row versions.

The infamous NOLOCK hint applied to a SELECT statement has the same effect as running the statement in a transaction set to "Read Uncommitted". Although NOLOCK is discouraged, the habit persists.

For more details, see the Transact‑SQL documentation for SET TRANSACTION ISOLATION LEVEL.

Isolation Levels in PostgreSQL

PostgreSQL officially supports all four ANSI levels, but in practice it only provides three. When a query requests "Read Uncommitted", PostgreSQL silently upgrades it to "Read Committed", thus disallowing dirty reads.

When the level is set to Read Uncommitted, you actually get Read Committed, and the PostgreSQL implementation of Repeatable Read prevents phantom reads, making the effective isolation stricter than requested.

PostgreSQL does not expose a separate snapshot isolation level; it automatically uses MVCC (multi‑version concurrency control) when "Read Committed" is selected.

Prior to version 9.1, PostgreSQL did not provide a true Serializable level and would downgrade such requests to Repeatable Read. This limitation still exists in some older versions.

Isolation Levels in MySQL

InnoDB defaults to "Repeatable Read" but offers all four ANSI levels. "Read Committed" uses snapshot isolation semantics.

For more information, see the InnoDB documentation on transaction isolation levels.

The MyISAM storage engine does not support transactions at all; it uses table‑level read/write locks, although inserts can sometimes bypass the lock.

Isolation Levels in Oracle

Oracle supports three isolation levels: Read Committed, Serializable, and Read‑Only. The default is Read Committed, which uses snapshot semantics.

Like PostgreSQL, Oracle does not allow "Read Uncommitted"; dirty reads are impossible.

Oracle does not expose a separate Repeatable Read level; to obtain that behavior you must use the Serializable level.

Read‑only transactions see only changes that were committed before the transaction began and do not permit INSERT, UPDATE, or DELETE statements.

Isolation Levels in DB2

DB2 defines four isolation levels—Repeatable Read, Read Stability, Cursor Stability, and Uncommitted Read—but they do not map directly to ANSI terminology.

In DB2, Repeatable Read corresponds to ANSI Serializable, preventing phantom reads.

Read Stability maps to ANSI Repeatable Read.

By default, "Cursor Stability" provides "Read Committed" semantics; from version 9.7 onward, snapshot semantics are used.

Uncommitted Read permits dirty reads, similar to SQL Server's uncommitted read, and is recommended only for read‑only tables or when seeing uncommitted data is acceptable.

Isolation Levels in MongoDB

MongoDB does not support multi‑document transactions. Single‑document operations are atomic, and two‑phase commit can emulate transaction‑like semantics, but intermediate states may be visible, effectively providing dirty‑read behavior where records can be duplicated or lost.

Isolation Levels in CouchDB

CouchDB also lacks transactions but uses multi‑version concurrency control to prevent dirty reads.

Read requests always see the latest snapshot of the database at the start of the request.

This makes CouchDB equivalent to a Read Committed isolation level with snapshot semantics.

Isolation Levels in Couchbase Server

Couchbase Server does not have a notion of isolation for indexes. Updates modify the primary index immediately, while secondary indexes are updated lazily.

Although the documentation is unclear, it appears that snapshot semantics are used when building indexes, so dirty reads should not be an issue, but delayed index updates prevent true "Read Committed" isolation.

Couchbase does not support transactions directly, but explicit locks can be used; they expire after 30 seconds.

Isolation Levels in Cassandra

In Cassandra 1.0 there was no isolation for writes; a read could return a mix of old and new values.

Starting with version 1.1, Cassandra provides "row‑level isolation", which is equivalent to the "Read Uncommitted" level; higher isolation levels are not possible.

Understanding Database Isolation Levels

As the examples show, treating a database simply as ACID or non‑ACID is insufficient; you must know which isolation levels it supports and under what circumstances they apply.

Source: https://www.infoq.com/articles/Isolation-Levels/

SQLtransactionDatabaseNoSQLIsolation LevelDirty Read
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.