Databases 19 min read

Understanding Database Isolation Levels and Dirty Reads

This article explains database isolation levels, including read committed, repeatable read, serializable, and snapshot isolation, discusses dirty reads, phantom reads, and how various systems such as SQL Server, PostgreSQL, MySQL, Oracle, MongoDB, and others implement or differ in these isolation semantics.

Architects Research Society
Architects Research Society
Architects Research Society
Understanding Database Isolation Levels and Dirty Reads

Important Points

Only thinking in terms of ACID or non‑ACID is insufficient; you also need to know the isolation levels supported by the database.

Some databases that claim "eventual consistency" may return results that are inconsistent at any point in time.

Certain databases provide isolation levels that are higher than what you require.

Dirty reads can cause you to see two versions of the same record, or miss a record entirely.

Repeated queries within a single transaction can produce phantom rows.

Recently, when developer David Glasser discovered MongoDB's poor default handling of dirty reads, MongoDB once again became a Reddit favorite. In this article we explain what isolation levels and dirty reads are and how they are implemented in popular databases.

In ANSI SQL there are four standard isolation levels: Serializable, Repeatable Read, Read Committed, and Read Uncommitted.

Many databases default to "Read Committed", which guarantees that you will not see intermediate data while the transaction is in progress. It achieves this by briefly acquiring a lock during the read while keeping write locks until the transaction commits.

If you need to repeat the same read operation multiple times within a transaction and want to be reasonably sure it always returns the same value, you must keep the read lock for the entire duration. Using the "Repeatable Read" isolation level does this automatically.

We call "Repeatable Read" "reasonably guaranteed" because phantom reads can still occur. When a query with a WHERE clause (e.g., "WHERE Status = 1") is executed, rows that satisfy the condition are locked, but nothing prevents new rows that also satisfy the condition from being inserted, leading to phantom rows on a subsequent execution.

To absolutely ensure that two reads in the same transaction return identical data, you can use the Serializable isolation level. This uses a "range lock" that prevents insertion of new rows matching the open transaction's WHERE clause.

Generally, higher isolation levels incur worse performance due to lock contention. To improve read performance, some databases also support "Read Uncommitted", which ignores locks (known as NOLOCK in SQL Server) and therefore performs dirty reads.

Dirty Read Issues

Before discussing dirty reads, you must understand that a table does not actually exist in the database; it is a logical construct. Data is stored in one or more indexes. In most relational databases the primary index is called a "clustered index" or a "heap" (the terminology differs for NoSQL databases). Therefore, an INSERT must add a row to each index, and an UPDATE typically touches the index entries for the modified columns, often performing a delete‑old‑entry and insert‑new‑entry operation.

The diagram below shows a simple table and an execution plan where two objects (IX_Customer_State and PK_Customer) are updated; the full name index IX_Customer_FullName is skipped because the full name did not change.

Note: In SQL Server, the PK prefix denotes the primary key, which is usually the clustered index key. IX denotes a non‑clustered index. Other databases have their own conventions.

Now let’s see the various ways dirty reads can cause data inconsistency.

Uncommitted reads are the easiest to understand. By ignoring write locks, a SELECT with the "Read Uncommitted" hint can see rows that have been inserted or updated before the transaction is fully committed. If the transaction later rolls back, the SELECT has returned data that never existed logically.

During an update, data movement causes two reads. Suppose you read all customers in California, and an update moves a customer from California to Texas while you are reading Texas records. You could see customer 1253 twice—once with the old value and once with the new value.

Lost reads happen the same way. If we move customer 1253 from Texas to Alaska and then query by state, the record may be completely missed. This is what happened with David Glasser’s MongoDB example: reading from the index during an update caused the record to be lost.

Depending on the database design and execution plan, dirty reads can also interfere with ordering. For example, if the engine gathers a set of pointers to all rows of interest, updates a row, and then copies data from the original location using those pointers, the ordering may become inconsistent.

Snapshot Isolation or Row‑Level Versioning

To provide good performance while avoiding dirty reads, many databases support snapshot isolation semantics. Under snapshot isolation, a transaction cannot see the results of any other transaction that started after it.

This is achieved by creating a temporary copy of the rows to be modified instead of relying solely on locks, a technique commonly called "row‑level versioning".

When "Read Committed" isolation is requested, most databases that support snapshot semantics automatically use it.

Isolation Levels in SQL Server

SQL Server supports all four ANSI SQL isolation levels plus an explicit snapshot level. Depending on how the READ_COMMITTED_SNAPSHOT option is configured, "Read Committed" can also use snapshot semantics.

Before enabling this option, thoroughly test your database. While it can improve read performance, it may slow writes, especially if tempdb resides on a slow drive, since tempdb stores the old row versions.

The infamous NOLOCK hint (applicable to SELECT statements) has the same effect as running a transaction with "Read Uncommitted". Because SQL Server 2000 and earlier did not provide row‑level versioning, this hint was widely used. Although it is no longer needed or recommended, the habit persists.

For more details see the SET TRANSACTION ISOLATION LEVEL (Transact‑SQL) documentation.

Isolation Levels in PostgreSQL

PostgreSQL officially supports all four ANSI isolation levels, but in practice it only provides three. When a query requests "Read Uncommitted", PostgreSQL silently upgrades it to "Read Committed", thus disallowing dirty reads.

When the Read Uncommitted level is chosen, PostgreSQL actually provides Read Committed, and its implementation of Repeatable Read prevents phantom reads, so the effective isolation may be stricter than requested. The SQL standard only defines phenomena that must not occur, not those that must occur.

PostgreSQL does not explicitly offer a snapshot isolation level; it occurs automatically when using "Read Committed" because PostgreSQL has always been built with multiversion concurrency control.

Prior to version 9.1, PostgreSQL did not provide true serializable transactions and silently downgraded them to "Repeatable Read". Current versions still have this limitation.

See section 13.2 Transaction Isolation for more information.

Isolation Levels in MySQL

InnoDB defaults to "Repeatable Read" but provides all four ANSI SQL isolation levels. "Read Committed" uses snapshot isolation semantics.

See the InnoDB documentation, section 15.3.2.1, for details on transaction isolation levels.

The MyISAM storage engine does not support transactions at all; it uses table‑level read/write locks, although inserts can sometimes bypass the lock.

Isolation Levels in Oracle

Oracle supports three transaction isolation levels: Read Committed, Serializable, and Read‑Only. The default is Read Committed, which uses snapshot semantics.

Like PostgreSQL, Oracle does not allow "Read Uncommitted" and therefore forbids dirty reads.

Oracle lacks an explicit "Repeatable Read" level; to obtain that behavior you must set the isolation level to Serializable.

Read‑Only transactions see only changes that were committed before the transaction started and do not allow INSERT, UPDATE, or DELETE statements.

See the documentation on Data Concurrency and Consistency for more details.

Isolation Levels in DB2

DB2 provides four isolation levels called Repeatable Read, Read Stability, Cursor Stability, and Uncommitted Read, though they do not map directly to ANSI terminology.

Repeatable Read in DB2 corresponds to the ANSI Serializable level, meaning phantom reads are impossible.

Read Stability maps to ANSI Repeatable Read.

By default, "Cursor Stability" implements "Read Committed". Starting with version 9.7, snapshot semantics are active; earlier versions used lock‑based behavior similar to SQL Server.

Uncommitted Read permits dirty reads, similar to SQL Server's uncommitted read, and is recommended only for read‑only tables or when seeing uncommitted data is acceptable.

See the DB2 documentation for more information on isolation levels.

Isolation Levels in MongoDB

MongoDB does not support multi‑document transactions. As the manual states, because only single‑document operations are atomic, two‑phase commits can only provide transaction‑like semantics, and during a two‑phase commit or rollback an application may see intermediate data.

Since MongoDB uses dirty‑read semantics, records may be duplicated or lost.

Isolation Levels in CouchDB

CouchDB also does not support transactions, but unlike MongoDB it uses multiversion concurrency control to prevent dirty reads.

Read requests always see the latest snapshot of the database at the start of the request.

This makes CouchDB equivalent to a Read Committed isolation level with snapshot semantics.

See the documentation on eventual consistency for more details.

Isolation Levels in Couchbase Server

Although often confused with CouchDB, Couchbase Server is a very different product and does not have a notion of isolation for its indexes.

During an update, only the primary index (and optionally a "real table") is updated; all secondary indexes are updated lazily.

The documentation is unclear, but it appears that snapshots are used when building indexes, so dirty reads should not be an issue. However, because of delayed index updates, a true "Read Committed" isolation level is still not achievable.

Like many NoSQL databases, Couchbase does not directly support transactions, but explicit locks can be used; these locks expire after 30 seconds.

See the Couchbase architecture and view engine documentation for more information.

Isolation Levels in Cassandra

In Cassandra 1.0 there was no isolation for writing a single row; fields were updated individually, so a read could return a mix of old and new values.

Starting with version 1.1, Cassandra provides "row‑level isolation", which corresponds to the "Read Uncommitted" isolation level of other databases. Higher isolation levels are not possible.

See the documentation on transactions and concurrency control for more details.

Understanding Database Isolation Levels

From the examples above it is clear that treating a database simply as ACID or non‑ACID is insufficient. You must know which isolation levels it supports and under what circumstances they apply.

SQLtransactionDatabaseIsolation LevelDirty Read
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.