How False Sharing in CPU Cache Slowed Down Our Database Table Scans—and How We Fixed It
During a client’s upgrade test, a database’s compressed tables exhibited severe slowdown under concurrent full‑table scans, which we traced to CPU cache line false sharing in the decompression code; using Linux perf tools we identified the hotspot, aligned memory, and restored performance.
Background
A financial‑industry client upgraded to a newer version of our database and observed that full‑table scans on compressed tables became dramatically slower as the number of concurrent scans increased, while uncompressed tables showed no such degradation. The issue was unexpected because the compression feature had been stable for years.
Investigation Approach
Because we could not access the client’s production system, we reproduced the problem locally. Initial hypotheses focused on configuration parameters, but disabling a compression‑related setting only yielded modest improvement. The next suspect was lock contention, yet system‑wide lock statistics showed no significant contention. The investigation then turned to CPU‑cache behavior.
Using Linux perf to Locate Hotspots
We employed the Linux perf suite:
perf stat -e task-clock -e cycles -e stalled-cycles-frontend -e stalled-cycles-backend -e cache-misses -e L1-dcache-loads -e L1-dcache-load-misses -e L1-dcache-stores -e L1-dcache-store-misses -e L1-dcache-prefetches -e L1-dcache-prefetch-misses -e LLC-loads -e LLC-load-misses -p 29261 sleep 50The output showed that stalled CPU cycles exceeded 80 %.
perf record -p 29261 -g sleep 50 perf report > perf_record.outAnalyzing the report revealed two hotspot functions involved in data decompression. Using perf annotate we drilled down to a single line that consumed 67 % of CPU time: perf annotate --symbol=decompress_xxx_by_yyy The corresponding source line (variables renamed for IP protection) was:
new_index = decompressinfo->xxx_arrayC[decompressinfo->xxx_arrayB[index]];And the array access itself:
xxx_arrayC[temp_index]Root Cause: False Sharing
The decompressinfo structure (≈48 bytes) is allocated from a memory pool and contains several pointer arrays. Its layout can cause the arrayC region to share a cache line with adjacent allocations. On a NUMA system with multiple cores, concurrent tasks writing to neighboring memory cause the shared cache line to bounce between cores—a classic false‑sharing scenario that inflates latency and stalls CPU cycles.
Illustration of the memory layout and cache‑line interaction:
Fix Implementation
We realigned the decompressinfo allocation to a cache‑line boundary, ensuring that its fields no longer share a line with other concurrently written data. After rebuilding and redeploying the database, we re‑ran the perf suite.
Results
Performance measurements before and after the fix:
Before optimization:
After optimization:
The hotspot line’s CPU consumption dropped dramatically, and overall metrics improved:
Stalled CPU cycles decreased markedly, indicating better CPU utilization.
L1‑dcache loads increased, showing more data fetched from low‑latency cache.
LLC loads fell, confirming reduced reliance on slower L3 cache.
Conclusion
The investigation demonstrates that subtle false‑sharing issues can cause severe performance degradation in high‑concurrency database workloads. Careful use of profiling tools, understanding of CPU cache architecture, and proper memory alignment are essential for diagnosing and fixing such problems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
