Databases 18 min read

Why Oracle Log File Sync Bottlenecks Appear and How to Eliminate Them

During high‑concurrency flash‑sale events, Oracle’s log file sync became a performance bottleneck; the article analyzes storage, OS, and Oracle Disk Manager factors, presents AWR metrics, demonstrates tuning steps—including disabling adaptive log file sync and enabling ODM—and shows measurable latency reductions.

dbaplus Community
dbaplus Community
dbaplus Community
Why Oracle Log File Sync Bottlenecks Appear and How to Eliminate Them

Introduction

Relational databases rely on ACID transactions, and Oracle writes redo information to a log before writing data. Log writes are sequential I/O, while data writes are random I/O; on mechanical disks random I/O is far slower, making the log file sync operation a potential performance bottleneck during high‑concurrency workloads such as flash‑sale events.

Problem Observation

At midnight a flash‑sale started with tens of thousands of concurrent users. The connection pool was exhausted, CPU reached 100%, and many wait events appeared. A 15‑minute AWR snapshot showed:

Redo size 11.8 MB/s

~1 612 transactions per second

~4.3 × 10⁴ executions per second

Log file sync contributed 12.1 % of DB time

Images of the AWR report illustrate these metrics:

Additional screenshots show average wait times: log file sync 44 ms, log file parallel write 9 ms.

Analysis and Measures

1. Storage Layer

LGWR writes to the online redo log. The storage SLA promised 0.5 ms write latency, but observed latency often reached 1‑3 ms, especially for large redo writes. Switching to a faster storage system did not yield noticeable improvement because the underlying file‑system (VXFS) introduced additional locks (vx_rwsleep_rec_lock) that limited throughput.

2. Operating System Layer

Using truss it was discovered that LGWR blocks on KAIO() and pwrite(). Normal pwrite latency is 0.0017 ms, but under load it can exceed 1.5 s. VXFS’s vx_rwsleep_rec_lock() also caused blocking. DTrace was employed to capture the LGWR call stack and confirm the contention points.

3. Oracle Disk Manager (ODM) Background

ODM bypasses the file‑system cache and locks, allowing Oracle to perform direct I/O to raw volumes. This can deliver performance comparable to raw devices while still using manageable file‑system storage.

4. Enabling ODM

Enabling ODM improves write latency but removes the benefit of OS cache for physical reads. After enabling ODM, db file sequential read latency increased from ~2 ms to ~6 ms, though overall transaction throughput improved.

5. Effect of Enabling ODM

Before and after screenshots show the impact:

Before:

After:

Log file parallel write latency dropped from ~1 ms to 0.3 ms, and log file sync latency fell from >1.5 ms to <1 ms. The increase in db file sequential read wait time did not noticeably affect application response.

6. Other Influencing Factors

Additional considerations include process priority (e.g., setting _high_priority_processes for LGWR), log file switch frequency (adjusting log_buffer and online redo log size), and the impact of archive log size mismatches.

7. Recommended Configuration

16 MB ≤ log_buffer ≤ min(128 MB, max(AUTO_SIZE,16 M))

300 MB ≤ online redo log file size ≤ 1024 MB

AUTO_SIZE = (cpu_count/16) × (cpu_count × 128)

These ranges balance buffer size, strand count, and I/O characteristics for typical Oracle deployments.

Disabling Adaptive Log File Sync

When log file sync latency is high (7‑8 ms) while log file parallel write remains low (1‑3 ms), the adaptive mechanism may be the cause. To disable it, execute:

alter system set "_use_adaptive_log_file_sync"=false scope=both;

After disabling, log file sync average wait time typically drops from ~7 ms to ~3 ms, and the related AWR or v$sysstat counters go to zero.

Final Results

With ODM enabled, adaptive log file sync disabled, and storage/OS parameters tuned, the average log file sync wait time under peak load fell to ~2 ms, while other metrics (read IOPS, write IOPS, read response time) improved significantly. The overall database performance became stable enough to handle >70 k SQL executions per second and 3 k transactions per second during subsequent flash‑sale events.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLStorage OptimizationOracleDatabase PerformanceODMOS TuningLog File Sync
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.