Databases 19 min read

Understanding Database Isolation Levels and Dirty Reads

This article explains the concepts of isolation levels, dirty reads, phantom rows, and how various relational and NoSQL databases implement these phenomena, providing practical guidance on choosing the appropriate isolation level to ensure data consistency and performance.

Architects Research Society

May 27, 2021

Understanding Database Isolation Levels and Dirty Reads

Key Points

Do not think only in terms of ACID or non‑ACID; you must know the isolation levels a database supports.

Some databases that claim “eventual consistency” may return results that are inconsistent at any point in time.

Some databases provide isolation levels higher than you require.

Dirty reads can cause you to see two versions of the same record or miss a record entirely.

Repeated queries in a single transaction may produce phantom rows.

Recently, when developer David Glasser discovered MongoDB's default handling of dirty reads, MongoDB once again made headlines on Reddit. In this article we explain what isolation levels and dirty reads are and how they are realized in popular databases.

In ANSI SQL there are four standard isolation levels: Serializable, Repeatable Read, Read Committed, and Read Uncommitted.

Many databases default to Read Committed , which guarantees that you will not see intermediate data while a transaction is in progress. It does this by briefly acquiring a lock during the read while keeping write locks until the transaction commits.

If you need to repeat the same read operation multiple times within a transaction and want to be sure it always returns the same value, you must keep the read lock for the entire duration. Using the Repeatable Read isolation level does this automatically.

We call “repeatable read” “reasonably guaranteed” because “phantom reads” can still occur. When a query with a WHERE clause (e.g., WHERE Status = 1) is executed, rows matching the condition are locked, but nothing prevents new rows that satisfy the condition from being inserted. The term “phantom” applies to rows that appear on the second execution of the query.

To absolutely guarantee that two reads in the same transaction return identical data, you can use the Serializable isolation level. This uses a range lock that prevents insertion of new rows that would match the WHERE clause of the open transaction.

Generally, higher isolation levels incur more lock contention and thus poorer performance. To improve read performance, some databases also support “Read Uncommitted”, which ignores locks (known as NOLOCK in SQL Server) and therefore performs dirty reads.

Dirty Read Issues

Before discussing dirty reads, you must understand that a table does not actually exist in the database; it is a logical construct. Data is stored in one or more indexes. In most relational databases the primary index is called a “clustered index” or a “heap”. When an insert occurs, a row must be added to each index; when an update occurs, the engine touches the index entries for the changed columns, usually performing a delete‑then‑insert for each index.

The diagram below shows a simple table and an execution plan where two objects, IX_Customer_State and PK_Customer, are updated. Because the full name did not change, the IX_Customer_FullName index is skipped.

Note: In SQL Server the PK prefix denotes the primary key, which is usually the clustered index key. IX denotes a non‑clustered index. Other databases have their own conventions.

Now let’s see the various ways dirty reads can cause data inconsistency.

Uncommitted reads are the easiest to understand. By ignoring write locks, a SELECT with the “Read Uncommitted” hint can see rows that have been inserted or updated before the transaction commits. If that transaction later rolls back, the SELECT has logically returned data that never existed.

During an update, data may be moved, causing two reads. Suppose you read all customers in California, and an update moves a customer from California to Texas while you are reading Texas records. You could see customer 1253 twice—once with the old value and once with the new value.

Lost reads happen the same way. If we move customer 1253 from Texas to Alaska and then query by state, the record may be completely missed. This is what happened with David Glasser’s MongoDB example: reading from the index during an update caused the record to be lost.

Depending on the database design and execution plan, dirty reads can also interfere with ordering. If the engine collects pointers to all interested rows, updates a row, and then copies data from the original location using those pointers, the result set may be inconsistent.

Snapshot Isolation or Row‑Level Versioning

To provide good performance while avoiding dirty reads, many databases support snapshot isolation semantics. Under snapshot isolation, a transaction cannot see the results of any other transaction that started after it.

This is achieved by creating a temporary copy of the rows to be modified instead of relying solely on locks, a technique often called “row‑level versioning”.

When “Read Committed Snapshot” is requested, most databases that support snapshot isolation automatically use it.

Isolation Levels in SQL Server

SQL Server supports all four ANSI SQL isolation levels plus an explicit snapshot level. Depending on how the READ_COMMITTED_SNAPSHOT option is configured, “Read Committed” can also use snapshot semantics.

Before enabling this option, test thoroughly: it can improve read performance but may slow writes, especially if tempdb resides on a slow drive, since that is where old row versions are stored.

The infamous NOLOCK hint (applicable to SELECT statements) has the same effect as running a transaction with “Read Uncommitted”. Although row‑level versioning is unavailable in SQL Server 2000 and earlier, the habit persists even though it is no longer recommended.

For more details see SET TRANSACTION ISOLATION LEVEL (Transact‑SQL).

Isolation Levels in PostgreSQL

PostgreSQL officially supports all four ANSI isolation levels, but in practice it only provides three. When a query requests “Read Uncommitted”, PostgreSQL silently upgrades it to “Read Committed”, thus disallowing dirty reads.

When “Read Uncommitted” is selected, you actually get “Read Committed”, and the PostgreSQL implementation of Repeatable Read prevents phantom reads, so the effective isolation may be stricter than requested. The SQL standard only defines phenomena that must not occur, not those that must occur.

PostgreSQL does not explicitly provide a snapshot isolation level; it occurs automatically when using “Read Committed” because PostgreSQL was designed from the start with multiversion concurrency control.

Before version 9.1 PostgreSQL did not provide a true serializable transaction and silently downgraded it to “Repeatable Read”. Current versions still have this limitation.

See section 13.2 “Transaction Isolation” for more information.

Isolation Levels in MySQL

InnoDB defaults to “Repeatable Read” but offers all four ANSI SQL isolation levels. “Read Committed” uses snapshot isolation semantics.

See the InnoDB documentation, section 15.3.2.1, for details on transaction isolation levels.

The MyISAM storage engine does not support transactions at all; it uses a table‑level read/write lock, although inserts can sometimes bypass the lock.

Isolation Levels in Oracle

Oracle supports three transaction isolation levels: Read Committed, Serializable, and Read‑Only. The default is Read Committed, which uses snapshot semantics.

Like PostgreSQL, Oracle does not allow “Read Uncommitted”, so dirty reads are impossible.

Oracle lacks an explicit “Repeatable Read” level; to obtain that behavior you must set the isolation level to Serializable.

Read‑Only transactions see only changes that were committed before the transaction started and do not allow INSERT, UPDATE, or DELETE statements.

See chapter 13 “Data Concurrency and Consistency” for more information.

Isolation Levels in DB2

DB2 provides four isolation levels called Repeatable Read, Read Stability, Cursor Stability, and Uncommitted Read, which do not map directly to ANSI terms.

Repeatable Read corresponds to ANSI Serializable (no phantom reads). Read Stability maps to ANSI Repeatable Read. By default, Cursor Stability implements “Read Committed”. Since version 9.7, snapshot semantics are active; earlier versions used lock‑based behavior similar to SQL Server.

Uncommitted Read permits dirty reads, similar to SQL Server’s uncommitted read, and is recommended only for read‑only tables or when seeing uncommitted data is acceptable.

See the DB2 documentation for isolation level details.

Isolation Levels in MongoDB

MongoDB does not support multi‑document transactions. As the manual states, only single‑document operations are atomic, and two‑phase commits can only provide transaction‑like semantics. During a two‑phase commit or rollback, an application may see intermediate data.

Effectively, MongoDB uses dirty‑read semantics, meaning records can be duplicated or lost.

Isolation Levels in CouchDB

CouchDB also does not support transactions, but unlike MongoDB it uses multiversion concurrency control to prevent dirty reads.

Read requests always see the latest snapshot of the database at the start of the request.

This makes CouchDB equivalent to a Read Committed isolation level with snapshot semantics.

See the documentation on eventual consistency for more information.

Isolation Levels in Couchbase Server

Although often confused with CouchDB, Couchbase Server is a very different product and does not have a concept of isolation for indexes.

During updates it only updates the primary index; secondary indexes are updated lazily. The documentation is unclear about snapshot usage, but if snapshots are used, dirty reads should not be an issue. However, because of delayed index updates you cannot achieve a true “Read Committed” isolation level.

Couchbase does not support transactions directly, but explicit locks can be used; they last 30 seconds and then expire automatically.

See the Couchbase architecture and lock documentation for details.

Isolation Levels in Cassandra

In Cassandra 1.0 there was no isolation for a single row write; fields were updated one by one, so a read could see a mix of old and new values.

Starting with version 1.1, Cassandra provides “row‑level isolation”, which gives it the same isolation as “Read Uncommitted” in other databases. Higher isolation levels are not possible.

See the Cassandra documentation on transactions and concurrency control for more information.

Understanding Database Isolation Levels

From the examples above it is clear that treating a database simply as ACID or non‑ACID is insufficient. You must know which isolation levels it supports and under what circumstances they apply.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.