Weekend IO Delays in Private Cloud? Diagnose and Mitigate RAID Consistency Checks
A private‑cloud environment experiences recurring weekend I/O delay alerts caused by RAID controller consistency checks (CC) and patrol reads (PR), and this guide explains the underlying mechanism, how to confirm the root cause, and detailed StorCLI commands to adjust or disable these checks.
Phenomenon
Private‑cloud platforms report weekly “qbd io delay IO” or “SLOW IO” alerts that start at a fixed weekend time (typically 11:00 AM). The symptoms are:
Alert always begins at the same weekend time.
All RAID virtual disks on every storage node are affected.
Only virtual disks managed by the RAID controller show degradation; SSDs or disks not managed by the RAID card are unaffected.
IOPS and bandwidth remain stable, but write latency spikes for several seconds.
Principle
Consistency Check (CC)
LSI RAID controllers provide a Consistency Check (CC) feature that validates mirrored or parity data for RAID levels 1, 5, 6, 10, 50 and 60. For RAID 1 it checks mirror consistency; for RAID 5 it checks parity integrity.
CC Execution Schedule
By default CC runs once every 168 hours (weekly) at controller time 03:00 AM on Saturday, which corresponds to 11:00 AM on the host. The check is performed in parallel on all logical drives.
Impact on Disk Performance
Duration : The execution time varies with RAID group size, RAID type and number of backend disks, typically lasting 2–10 hours.
Severity : During CC (and Patrol Read, PR) write latency can reach seconds to ten seconds and disk utilization may rise to 60‑80%.
Reference: https://www.percona.com/blog/2008/03/05/raid-system-performance-surprises/
Verification that CC/PR Causes the I/O Delay
Confirm the issue always starts at a fixed weekend time (e.g., supervisor logs show crashes on Saturdays/Sundays).
Verify that all storage nodes’ RAID virtual disks are impacted.
Check that only RAID‑card‑managed virtual disks show degradation while non‑RAID disks remain normal.
Determine whether a CC or PR operation was running at the time of the incident.
StorCLI Tool
Newer RAID firmware (as of 2023‑03‑06) no longer supports MegaCLI; use StorCLI instead. Download the binary from the vendor’s official site.
Mitigation Measures
When CC or PR is enabled its impact cannot be eliminated, but the following adjustments can reduce the severity:
Shift CC execution to off‑peak hours.
Increase the interval between CC runs (e.g., from weekly to every 2‑3 weeks).
Lower CC intensity (e.g., from the default 30 % to 5‑15 %).
Change CC execution mode from parallel to serial.
Shift PR execution to off‑peak hours.
Increase the PR interval (e.g., from weekly to every 2‑3 weeks).
Adjust PR intensity.
Note: Reducing intensity lengthens the execution window; it only lessens the frequency of latency spikes.
StorCLI Commands for Adjustments
In the examples below /cX refers to RAID controller X (e.g., /c0 is the first controller). storcli /c0 set cc=seq Set CC start time (only when CC is enabled): storcli /c0 set cc starttime="yyyy/mm/dd hh" Set CC interval (delay in hours, 336 h = 2 weeks): storcli /c0 set cc delay=336 Set CC intensity (percentage of I/O bandwidth used by CC): storcli /c0 set ccrate=10 Set PR interval: storcli /c0 set pr delay=336 Set PR start time: storcli /c0 set pr starttime="yyyy/mm/dd hh" Set PR intensity: storcli /c0 set prrate=10 Combined example adjusting both CC and PR:
storcli /c0 set cc=seq delay=336 starttime="2023/03/18 03"
storcli /c0 set ccrate=10
storcli /c0 set pr starttime="2023/03/18 03" delay=336
storcli /c0 set prrate=10Common Operations
Query controller time: storcli /c0 show time Query all logical drive information: storcli /c0/vall show CC related commands:
Show CC schedule and status: storcli /c0 show cc Show LD CC status: storcli /c0/v0 show cc Pause LD CC: storcli /c0/v0 pause cc Stop LD CC: storcli /c0/v0 stop cc Disable CC: storcli /c0 set cc=off Resume LD CC: storcli /c0/v0 resume cc PR related commands (RAID‑card level):
Show PR status: storcli /c0 show pr Pause PR: storcli /c0 pause pr Stop PR: storcli /c0 stop pr Disable PR: storcli /c0 set pr=off Resume PR:
storcli /c0 resume prOther References
Server RAID card Consistency Check introduction: https://bbs.huaweicloud.com/forum/forum.php?mod=viewthread&tid=108949
Tech Stroll Journey
The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
