Is Raising work_mem from 4 MB to 64 MB Really Optimizing Sorts? The 2 TB PostgreSQL OOM Time Bomb
The article explains why increasing PostgreSQL's work_mem does not guarantee per‑query memory limits, how multiple sort/hash nodes, parallel workers and long‑lived memory contexts can cause OOM even on a 2 TB server, and offers concrete diagnostics and mitigation strategies for DBAs and developers.
Conclusion: work_mem is often misunderstood
Many assume that raising work_mem from 4 MB to 64 MB optimizes sorting and hashing, but the real danger is embedding a hidden memory bomb in production databases.
Why work_mem alone cannot prevent OOM
PostgreSQL documentation states that work_mem is the per‑operation memory budget before writing temporary files . Typical objects are sort and hash tables; complex queries may contain multiple such nodes, and concurrent sessions multiply the total consumption. The hash_mem_multiplier (default 2.0) further inflates hash memory, and each parallel worker adds its own share, so a 4‑worker query can consume up to five times the non‑parallel memory usage.
In a real incident, a production cluster with 2 TB RAM was killed by the OOM killer while work_mem was only 2 MB. Using pg_log_backend_memory_contexts, the author observed:
ExecutorState ≈ 235 MB
HashTableContext ≈ 340 MB
Total ≈ 557 MB
ExecutorState contained 524,059 chunks
The key insight is that memory allocated in long‑lived contexts (e.g., ExecutorState) is not released until the entire query finishes, allowing memory to snowball.
What work_mem actually controls
Officially, work_mem is the threshold " before writing to temporary disk files " – a node‑level memory‑to‑disk switch. It influences whether a sort or hash stays in memory or spills early, but it does not bound the total memory a query will ultimately consume.
Therefore, simply asking "Is 64 MB work_mem safe for a 256 GB server?" misses the fundamental questions:
How many concurrent sessions are running?
How many memory‑intensive nodes exist in the plan?
Is parallel execution enabled?
Is the hash memory multiplier applied?
Are there estimation errors in statistics?
Does the SQL combine many stages into a single long‑lived execution unit?
Do functions, CTEs, sub‑queries or JOIN patterns delay memory release?
The real culprits: SQL structure and statistics
The OOM case was not caused by a wrong parameter but by a query that called a PL/pgSQL function, performed a COPY, and then JOINed the result. Although syntactically valid, this design forced the executor to keep large intermediate data alive for the whole query.
Two fundamental patterns exacerbate the problem:
Misestimated row counts lead the planner to choose plans that avoid spilling, causing memory blow‑up.
Encapsulating logic in functions, CTEs, or nested sub‑queries creates a single, long‑lived execution pipeline, preventing timely memory release.
Database optimizers care about data flow, operator lifetimes, and cost models, not about code elegance.
How to manage work_mem effectively
DBAs should adopt a risk‑governance approach rather than relying on a single knob:
Control concurrency.
Fix inaccurate statistics.
Rewrite problematic SQL.
Limit parallelism.
Set appropriate statement timeouts.
Monitor memory contexts with pg_log_backend_memory_contexts(pid) (requires superuser or pg_read_all_stats).
Isolate heavy queries from peak traffic.
For developers, avoid nesting functions, CTEs, and sub‑queries that create long execution lifetimes, materialize intermediate results when possible, and recognize that “small” queries may become large execution pipelines.
When increasing work_mem is appropriate
Raising work_mem can be beneficial if all of the following hold:
Low and stable concurrency.
Workloads are primarily analytical with expensive sort/hash operations.
Sufficient memory headroom exists.
Parallelism is controlled.
Execution plans are stable.
The bottleneck is confirmed to be spilling to temporary files, not misestimation or bad SQL design.
If these conditions are not met, increasing work_mem is a gamble that can turn a slow query into a cluster‑wide failure.
Final warning
Even with 2 TB of RAM, a poorly written query can crash the system. Mature PostgreSQL teams treat work_mem as a per‑node budget and a memory‑disk threshold, not as a universal performance switch. Misusing it turns a lever into a hammer that can smash the whole database.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
