21.6.26

How to Troubleshoot and Fix Oracle Database 'log file sync' Wait Events

 If your Oracle Database is experiencing sudden performance degradation, users are complaining about slow application response times, and your Automatic Workload Repository (AWR) report highlights "log file sync" as a top wait event, you are dealing with a critical transaction processing bottleneck.

When this wait event spikes, it directly limits your database's transaction throughput. In this comprehensive guide, we will break down exactly what the log file sync event is, its root causes, and how to systematically diagnose and resolve it.
What is the 'log file sync' Wait Event?
When a user session issues a COMMIT or ROLLBACK statement, the server process must ensure that the transaction's redo information is safely written from the Redo Log Buffer in the SGA to the physical Redo Log Files on disk.
The server process signals the Log Writer (LGWR) background process to flush the buffer. The server process then waits on the log file sync event until LGWR confirms that the write operation to disk is complete.
Identifying the Symptoms in AWR or Top Activity
Before changing any configuration, verify the problem using an AWR, ASH, or Statspack report. Look for the following red flags in the Top 10 Foreground Wait Events section:
  • The log file sync event appears near the top of the list.
  • The Avg Wait (ms) exceeds 10-15 milliseconds (Ideally, it should be under 5ms, especially on modern SSD/NVMe storage).
  • High percentage of database time (% DB time) is consumed by this single event.
Step-by-Step Troubleshooting Framework
To isolate the root cause, you need to check whether the delay is caused by slow physical disk I/O, an overworked LGWR process, or application-level design flaws.
Step 1: Differentiate Between LGWR I/O and CPU Bottlenecks
Compare the average wait time of log file sync against the log file parallel write wait event (which tracks the actual time LGWR spends writing to disk).
  • Scenario A: Both log file sync and log file parallel write are high.
    • Meaning: The bottleneck is physical disk I/O. Your storage subsystem cannot write the redo data fast enough.
  • Scenario B: log file sync is high, but log file parallel write is very low.
    • Meaning: LGWR is writing to disk quickly, but there is a delay in communication, or the OS is suffering from CPU starvation, preventing LGWR from getting scheduled quickly.
Step 2: Evaluate Application Commit Frequency
Examine the Instance Activity Stats section of your AWR report. Check the ratio of user commits + user rollbacks relative to the total transaction volume.
  • If your application issues a COMMIT after every single row insert inside a loop (row-by-row processing), it forces LGWR to work constantly, causing a massive backup.
Step 3: Check for High OS CPU/Memory Utilization
If the server hosting the database is running at 100% CPU capacity, the operating system cannot allocate CPU cycles to the LGWR process in a timely manner. This delays the confirmation signal sent back to the user session.
Practical Solutions to Fix 'log file sync'
1. Optimize Storage Performance (Fixes Physical I/O)
  • Move Redo Logs to Fast Disk: Relocate your online redo log files to your fastest available storage (SSD, NVMe, or flash storage arrays).
  • Avoid RAID 5: Never place online redo logs on a RAID 5 configuration due to the parity write penalty. Use RAID 1+0 (Mirroring + Striping) instead.
  • Isolate Redo Disk Groups: Ensure redo log disks are separate from data files or archive log destinations to avoid I/O contention.
2. Tune Application Logic (Fixes High Commit Frequency)
  • Implement Batch Commits: Modify application code to use array processing and commit in batches (e.g., every 1,000 or 10,000 rows) rather than after every single row.
  • Use PL/SQL Collections: Utilize FORALL statements with the BULK COLLECT clause to reduce context switching and commit overhead.
3. Adjust Oracle Database Initialization Parameters
  • Log Buffer Size: Ensure your LOG_BUFFER initialization parameter is sufficiently sized (typically between 32MB and 128MB for modern systems) to prevent buffer allocation bottlenecks.
  • Asynchronous I/O: Verify that DISK_ASYNCH_IO = TRUE is enabled at the database instance and operating system levels to allow parallel processing.
  • Consider COMMIT_WRITE / COMMIT_LOGGING (Advanced): For non-critical, high-throughput batch loads where minor data loss during a crash is acceptable, you can dynamically alter your session settings:
    sql
    ALTER SESSION SET COMMIT_WRITE = 'BATCH,NOWAIT';


    Resolving log file sync wait events requires a balanced approach. Always start by comparing it against log file parallel write to quickly determine whether your solution lies in upgrading storage infrastructure or fixing inefficient application design. By optimizing commit frequencies and placing redo logs on highly responsive disks, you can unlock massive performance gains across your Oracle environment.

No comments:

Post a Comment

Really Thanks