Why Go’s database/sql Deadlock Needed a New closingMutex and How It Works

Go’s database/sql package can deadlock when a Scan holds a read lock, a cancelled context triggers Close, and Columns attempts another read lock, but the new closingMutex introduced in Go’s source replaces the generic sync.RWMutex to allow reads during pending writes, preventing the deadlock.

Tech Musings
Tech Musings
Tech Musings
Why Go’s database/sql Deadlock Needed a New closingMutex and How It Works

Introduction

The sync.RWMutex is a common Go concurrency primitive designed for typical read‑write lock scenarios, especially many reads and few writes. When a write operation is waiting, new read requests are blocked, which avoids writer starvation but can cause a real deadlock in database/sql when combined with Rows.Scan, Rows.Columns and the Close call order.

Why a closingMutex Is Needed

1.1 When is sync.RWMutex Suitable?

sync.RWMutex

works well for the classic pattern where multiple readers can proceed concurrently and a writer needs exclusive access. In that case, once a writer is waiting, new readers are blocked to prevent writer starvation.

In database/sql, the lock is not used for ordinary data updates; it protects the Close operation, i.e., cleanup, not shared business state. The crucial constraint is that the exit process must not block still‑running reads.

1.2 What Is closingMutex ?

closingMutex

is a Go‑official fix for the issue database/sql: avoid deadlock from reentrant RLock. It retains the read‑shared/write‑exclusive form but changes the priority rule:

When a write lock is actually held, readers must wait.

When a write lock is only waiting, new readers are still allowed.

Note: This is a targeted adjustment for the Close operation, not a universal read‑write lock replacement.

1.3 Mapping the Fix to the Source

The upstream change replaces the field in src/database/sql/sql.go from: closemu sync.RWMutex to:

closemu closingMutex

How the Deadlock Happens

database/sql.Rows

must coordinate two kinds of operations: Scan, Columns, Next and other read operations.

The Close cleanup operation.

The old implementation used sync.RWMutex (read via RLock, close via Lock). The deadlock arises from the following three‑step sequence:

1. Call Rows.Scan, which reads into sql.RawBytes → Rows holds a read lock.
   Because RawBytes references an internal buffer, the read lock is not released immediately after Scan returns.

2. The context is cancelled → a background goroutine calls rows.close().
   Close tries to acquire the write lock but is blocked by the read lock from step 1.

3. Call Rows.Columns → Columns needs to acquire a read lock again.
   Since a write operation is waiting, sync.RWMutex blocks the new RLock, forming a circular wait and causing a deadlock.

The key is the semantics of sync.RWMutex: when a writer is waiting, new readers are blocked, creating a cycle between the pending writer and the new reader.

Reproducing the Issue with SQLite

The following test reproduces the deadlock in a real database/sql call chain:

func TestSQLiteRealDeadlock(t *testing.T) {
    result := make(chan string, 1)

    go func() {
        db, _ := sql.Open("sqlite", ":memory:")
        db.Exec(`CREATE TABLE documents (id INTEGER, title TEXT, data BLOB)`)
        db.Exec(`INSERT INTO documents VALUES (1, "test", ?)", make([]byte, 1024))

        ctx, cancel := context.WithCancel(context.Background())
        rows, _ := db.QueryContext(ctx, "SELECT * FROM documents")
        rows.Next()

        var id int
        var title string
        var rb sql.RawBytes
        rows.Scan(&id, &title, &rb)

        cancel()
        time.Sleep(200 * time.Millisecond)

        rows.Columns()
        result <- "done"
    }()

    select {
    case <-result:
        t.Fatal("expected deadlock, but Columns returned")
    case <-time.After(3 * time.Second):
        t.Log("deadlock confirmed")
    }
}

This code demonstrates the three‑part deadlock scenario: RawBytes holds the lock, context cancellation triggers background Close, and Columns attempts another read lock.

How closingMutex Fixes the Problem

The Go team’s fix replaces the lock only for the Close synchronization, leaving the outer API unchanged.

Core Rules

When a write lock is truly held, readers must wait.

When a write lock is only waiting, new readers may still acquire the lock.

This differs from sync.RWMutex, which blocks new readers as soon as a writer is waiting. The new lock prioritises allowing pending reads to finish before the close proceeds.

State Encoding

type closingMutex struct {
    // state = 2*readers + writerWaitingBit
    //   0 : unlocked
    //   1 : no readers, writer waiting
    //  >0 : readers present, LSB indicates writer waiting
    //  -1 : writer lock held
    state atomic.Int64
    mu    sync.Mutex
    read  *sync.Cond
    write *sync.Cond
}

The design packs the reader count and writer‑waiting flag into a single atomic integer, using CAS for fast paths and sync.Cond only when waiting is required.

TryRLock – Writer Waiting Is Not a Blocking Condition

func (m *closingMutex) TryRLock() bool {
    for {
        x := m.state.Load()
        if x < 0 { // writer already holds lock
            return false
        }
        if m.state.CompareAndSwap(x, x+2) {
            return true
        }
    }
}

The crucial check is x < 0: only when the writer actually holds the lock does the read fail. If the state is 1 (writer waiting) or 3 (one reader + writer waiting), the read can still succeed.

Lock – Writer Registers, Then Waits for the Last Reader

func (m *closingMutex) Lock() {
    m.mu.Lock()
    defer m.mu.Unlock()
    for {
        x := m.state.Load()
        if (x == 0 || x == 1) && m.state.CompareAndSwap(x, -1) {
            return
        }
        if x&1 == 0 && !m.state.CompareAndSwap(x, x|1) {
            continue
        }
        m.init()
        m.write.Wait()
    }
}

When a writer arrives and readers are still active, the lowest bit is set to indicate a waiting writer, but new readers are not blocked. The flag merely tells the last exiting reader to wake the writer.

RUnlock – The Last Reader Wakes the Writer

func (m *closingMutex) RUnlock() {
    for {
        x := m.state.Load()
        if x < 2 {
            panic("runlock of un‑locked mutex")
        }
        if m.state.CompareAndSwap(x, x-2) {
            if x-2 == 1 { // last reader left, writer waiting
                m.mu.Lock()
                defer m.mu.Unlock()
                m.write.Broadcast()
            }
            return
        }
    }
}

When x-2 == 1, all readers have finished while a writer is still waiting, so the writer is awakened.

Putting the Fix Back into the Deadlock Scenario

Re‑running the three‑step sequence with closingMutex yields the following state transitions:

initial state = 0
1. Scan acquires read lock → state: 0 → 2
2. Background Close arrives, sees a reader → state: 2 → 3 (1 reader + writer waiting)
3. Columns acquires another read lock → state: 3 → 5 (still allowed)
4. Columns finishes, releases one read lock → state: 5 → 3
5. RawBytes read lock releases → state: 3 → 1 (writer waiting, no readers)
   → last reader triggers wake‑up
6. Close obtains write lock → state: 1 → -1
7. Close completes → state: -1 → 0

If sync.RWMutex were used, step 3 would block at state 3, causing the deadlock. The fix’s key is allowing reads while a writer is merely waiting.

Testing the Fix

Official Test Highlights

m.RLock()
lock3Done := start(t, m.Lock)

m.RLock()
m.RUnlock()

m.RUnlock()
wait(t, lock3Done)

The test confirms that after a writer starts waiting, a new RLock still succeeds, which would not happen with a plain sync.RWMutex.

Conclusion

The difference between concurrency primitives is not only about performance but also about matching business‑specific semantics. While sync.RWMutex ’s writer‑priority is suitable for most read‑heavy workloads, in the database/sql close path it can cause deadlocks. The specialized closingMutex changes only the rule “whether a waiting writer blocks new readers,” enabling the exit operation to coexist safely with in‑flight reads.

TestingDeadlockGodatabase/sqlclosingMutexsync.RWMutex
Tech Musings
Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.