Fundamentals 55 min read

How to Choose, Configure, and Monitor RAID for Production Systems

This comprehensive guide walks you through RAID fundamentals, explains each RAID level’s performance and reliability trade‑offs, shows real‑world selection criteria, provides step‑by‑step Linux and hardware RAID configuration scripts, monitoring tools, troubleshooting tips, and best‑practice recommendations for modern storage environments.

Ops Community
Ops Community
Ops Community
How to Choose, Configure, and Monitor RAID for Production Systems

RAID Overview

RAID (Redundant Array of Independent Disks) aggregates multiple physical disks to provide higher performance, data protection, or both. The main driver for using RAID is to mitigate the risk of disk failure (average annual failure rate ~2% per disk) and to overcome the performance limits of a single drive.

Two broad categories exist:

Hardware RAID : Dedicated RAID controller with its own CPU, cache and optional battery/ capacitor backup. Offloads RAID calculations from the host.

Software RAID : Managed by the operating system (Linux mdadm, Windows Dynamic Disks). No extra hardware cost, but consumes CPU cycles and lacks battery‑backed cache.

Fake/BIOS RAID is generally discouraged for production workloads.

RAID Levels

RAID 0 (Striping) : Data is striped across all disks. Capacity = N × disk size. Read/write performance scales linearly. No redundancy – a single disk failure destroys the array. Use only for temporary high‑throughput workloads where data loss is acceptable.

RAID 1 (Mirroring) : Identical copies on two (or more) disks. Capacity = 50 % of raw space. Read can be up to 2× faster; write speed similar to a single disk. Can survive one disk failure (two‑disk mirror) or more with additional mirrors.

RAID 5 (Single Parity) : Striping with distributed parity. Capacity = (N‑1) × disk size. Read performance is high; write incurs a read‑modify‑write penalty (≈4× slower). Tolerates a single disk failure. Rebuild time can be long on large disks.

RAID 6 (Double Parity) : Two independent parity blocks. Capacity = (N‑2) × disk size. Can survive two simultaneous disk failures. Write penalty higher than RAID 5.

RAID 10 (Mirrored Stripes) : Combines RAID 1 and RAID 0. Capacity = 50 % of raw space. Provides both high read/write performance and redundancy. Recommended for databases and virtualization.

RAID 50 / RAID 60 : RAID 5 or RAID 6 groups striped together. Used for very large storage pools where a balance of capacity, performance, and fault tolerance is needed.

JBOD : Disks presented as a single logical volume without redundancy or performance gain. Suitable for cold backup or archival storage.

Linux Software RAID (mdadm) – Practical Steps

Environment preparation

# List disks
lsblk
# Verify they are empty
fdisk -l /dev/sdb /dev/sdc /dev/sdd /dev/sde

Create RAID 0

# mdadm --create /dev/md0 --level=0 --raid-devices=4 \
    /dev/sdb /dev/sdc /dev/sdd /dev/sde
cat /proc/mdstat
mkfs.xfs /dev/md0
mkdir -p /data/raid0
mount /dev/md0 /data/raid0
echo '/dev/md0 /data/raid0 xfs defaults 0 0' >> /etc/fstab

Create RAID 1

# mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdb /dev/sdc
mkfs.ext4 /dev/md1
mkdir -p /data/raid1
mount /dev/md1 /data/raid1

Create RAID 5 with a hot spare

# mdadm --create /dev/md5 --level=5 --raid-devices=4 \
    /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm --add /dev/md5 /dev/sdf   # hot spare
mdadm --detail /dev/md5

Create RAID 10

# mdadm --create /dev/md10 --level=10 --raid-devices=4 \
    /dev/sdb /dev/sdc /dev/sdd /dev/sde
# Optional layout (near/far/offset)
mdadm --create /dev/md10 --level=10 --layout=f2 --raid-devices=4 \
    /dev/sdb /dev/sdc /dev/sdd /dev/sde

Persist configuration

# Save to mdadm.conf
mdadm --detail --scan >> /etc/mdadm.conf
# Update initramfs (CentOS/RHEL)
dracut -f
# Update initramfs (Debian/Ubuntu)
update-initramfs -u

Hardware RAID – StorCLI, Dell PERC, HP SmartArray

StorCLI installation (2025 version)

wget https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/storcli_007.2705.0000.0000_linux.zip
unzip storcli_*.zip
cd storcli_*
rpm -ivh storcli-*.rpm
ln -s /opt/MegaRAID/storcli/storcli64 /usr/local/bin/storcli

Basic commands

# Show controllers
storcli show
# Show physical disks
storcli /c0 /eall /sall show
# Create RAID 0
storcli /c0 add vd r0 drives=252:0-3
# Create RAID 1
storcli /c0 add vd r1 drives=252:0,252:1
# Create RAID 5 with spare
storcli /c0 add vd r5 drives=252:0-3 spare=252:4
# Create RAID 10 (far layout)
storcli /c0 add vd r10 drives=252:0-3 layout=f2
# Set write‑back cache (requires healthy BBU)
storcli /c0 /v0 set wrcache=WB rdcache=RA

Dell PERC (perccli) and HP SmartArray (ssacli) use the same syntax as StorCLI for creating and managing virtual disks.

Monitoring and Automation

Software RAID monitor (mdraid_monitor.sh)

#!/bin/bash
ALERT_EMAIL="[email protected]"
LOG_FILE="/var/log/mdraid_monitor.log"

timestamp(){ date "+%Y-%m-%d %H:%M:%S"; }
log(){ echo "[$(timestamp)] $1" | tee -a "$LOG_FILE"; }
send_alert(){
  local subject="$1" message="$2"
  echo "$message" | mail -s "$subject" "$ALERT_EMAIL"
  log "ALERT: $subject"
}

check_raid_status(){
  if [ ! -f /proc/mdstat ]; then log "No software RAID detected"; return 0; fi
  local has_issue=0 issues=""
  for md in /dev/md*; do
    [ -b "$md" ] || continue
    local detail=$(mdadm --detail "$md" 2>/dev/null) || continue
    local state=$(echo "$detail" | grep "State :" | awk -F: '{print $2}' | xargs)
    local failed=$(echo "$detail" | grep "Failed Devices" | awk -F: '{print $2}' | xargs)
    log "Checking $(basename $md): State=$state Failed=$failed"
    if [[ $state == *degraded* ]] || [[ $state == *FAILED* ]] || [ $failed -gt 0 ]; then
      has_issue=1
      issues+="
$(basename $md): State=$state Failed=$failed"
    fi
    if [[ $state == *recovering* ]] || [[ $state == *resyncing* ]]; then
      local prog=$(cat /proc/mdstat | grep -A1 "$(basename $md)" | grep recovery | grep -oP '\d+\.\d+%')
      log "$(basename $md): Rebuilding $prog"
    fi
  done
  if [ $has_issue -eq 1 ]; then
    send_alert "[CRITICAL] RAID Issue on $(hostname)" "Issues:$issues
Full status:
$(cat /proc/mdstat)"
    return 1
  else
    log "All RAID arrays are healthy"
    return 0
  fi
}

check_disk_smart(){
  log "Checking disk SMART status..."
  for disk in $(ls /dev/sd? 2>/dev/null); do
    local smart=$(smartctl -H "$disk" 2>/dev/null | grep "SMART overall-health" | awk -F: '{print $2}' | xargs)
    if [ -n "$smart" ] && [ "$smart" != "PASSED" ]; then
      send_alert "[WARNING] Disk SMART Failure on $(hostname)" "Disk $disk SMART status: $smart"
    fi
    local reallocated=$(smartctl -A "$disk" 2>/dev/null | awk '/Reallocated_Sector_Ct/ {print $10}')
    if [ -n "$reallocated" ] && [ $reallocated -gt 100 ]; then
      send_alert "[WARNING] Disk Degradation on $(hostname)" "Disk $disk has $reallocated reallocated sectors"
    fi
  done
}

main(){
  log "===== RAID Monitor Start ====="
  check_raid_status
  check_disk_smart
  log "===== RAID Monitor End ====="
}
main

Hardware RAID monitor (hwraid_monitor.sh) follows the same pattern, detecting the installed CLI (StorCLI, perccli, ssacli) and checking controller health, BBU status, virtual disk state, physical disk state, and rebuild progress. Alerts are sent via email.

Prometheus Exporter (python)

#!/usr/bin/env python3
"""RAID Prometheus Exporter – exposes RAID status as metrics"""
from prometheus_client import start_http_server, Gauge
import re, time

RAID_ARRAY_STATUS = Gauge('raid_array_status','RAID array health (1=healthy,0=degraded)', ['device','level'])
RAID_DISK_STATUS   = Gauge('raid_disk_status','RAID disk health (1=active,0=failed)', ['device','disk'])
RAID_REBUILD_PROGRESS = Gauge('raid_rebuild_progress','RAID rebuild progress %', ['device'])
RAID_TOTAL_DISKS   = Gauge('raid_total_disks','Total disks in array', ['device'])
RAID_ACTIVE_DISKS  = Gauge('raid_active_disks','Active disks', ['device'])
RAID_FAILED_DISKS  = Gauge('raid_failed_disks','Failed disks', ['device'])
RAID_SPARE_DISKS   = Gauge('raid_spare_disks','Spare disks', ['device'])

def parse_mdstat():
    try:
        with open('/proc/mdstat') as f:
            content = f.read()
    except FileNotFoundError:
        return []
    arrays = []
    current = None
    for line in content.split('
'):
        m = re.match(r'(md\d+)\s*:\s*(\w+)\s+(raid\d+|linear)\s+(.*)', line)
        if m:
            current = {'device': m.group(1), 'status': m.group(2), 'level': m.group(3), 'disks': []}
            arrays.append(current)
            disk_str = m.group(4)
            for d in re.findall(r'(\w+)\[(\d+)\](?:\(([FSW])\))?', disk_str):
                current['disks'].append({'name': d[0], 'index': int(d[1]), 'state': d[2] or 'active'})
        if current and ('recovery' in line or 'resync' in line):
            prog = re.search(r'(\d+\.\d+)%', line)
            if prog:
                current['rebuild_progress'] = float(prog.group(1))
    return arrays

def collect_metrics():
    for a in parse_mdstat():
        dev = a['device']; lvl = a['level']
        healthy = 1 if a['status'] == 'active' else 0
        RAID_ARRAY_STATUS.labels(device=dev, level=lvl).set(healthy)
        total = len(a['disks'])
        active = sum(1 for d in a['disks'] if d['state']=='active')
        failed = sum(1 for d in a['disks'] if d['state']=='F')
        spare  = sum(1 for d in a['disks'] if d['state']=='S')
        RAID_TOTAL_DISKS.labels(device=dev).set(total)
        RAID_ACTIVE_DISKS.labels(device=dev).set(active)
        RAID_FAILED_DISKS.labels(device=dev).set(failed)
        RAID_SPARE_DISKS.labels(device=dev).set(spare)
        for d in a['disks']:
            RAID_DISK_STATUS.labels(device=dev, disk=d['name']).set(1 if d['state']=='active' else 0)
        if 'rebuild_progress' in a:
            RAID_REBUILD_PROGRESS.labels(device=dev).set(a['rebuild_progress'])
        else:
            RAID_REBUILD_PROGRESS.labels(device=dev).set(100)

if __name__ == '__main__':
    start_http_server(9100)
    while True:
        collect_metrics()
        time.sleep(30)

Typical Prometheus alert rules (example):

groups:
- name: raid_alerts
  rules:
  - alert: RAIDArrayDegraded
    expr: raid_array_status == 0
    for: 1m
    labels: {severity: critical}
    annotations:
      summary: "RAID array {{ $labels.device }} is degraded"
  - alert: RAIDDiskFailed
    expr: raid_failed_disks > 0
    for: 1m
    labels: {severity: critical}
    annotations:
      summary: "RAID array {{ $labels.device }} has failed disks"
  - alert: RAIDRebuildInProgress
    expr: raid_rebuild_progress < 100
    for: 5m
    labels: {severity: warning}
    annotations:
      summary: "RAID array {{ $labels.device }} is rebuilding"

Capacity & Performance Calculations

Capacity formulas

# RAID 0:   usable = N × disk_size
# RAID 1:   usable = (N/2) × disk_size
# RAID 5:   usable = (N‑1) × disk_size
# RAID 6:   usable = (N‑2) × disk_size
# RAID 10:  usable = (N/2) × disk_size
# RAID 50:  usable = total_disks – groups
# RAID 60:  usable = total_disks – 2×groups

Performance (SSD 50 k IOPS per disk)

# RAID 0: read/write ≈ 50k × N IOPS
# RAID 1: read ≈ 2×50k, write ≈ 50k
# RAID 5: read ≈ 50k × (N‑1), write ≈ 50k × (N‑1) / 4
# RAID 10: read ≈ 50k × N, write ≈ 50k × N / 2

Actual numbers depend on controller cache, stripe size, and workload characteristics.

Disk Selection

SSD vs HDD : SSDs for random‑IO intensive workloads (databases, VMs). HDDs for cold storage, large sequential archives, or video surveillance.

Enterprise vs Consumer SSD : Enterprise SSDs offer higher DWPD, power‑loss protection, longer MTBF and longer warranties (5 yr). Consumer SSDs are cheaper but lack protection and have lower endurance.

Interface :

SATA – up to 550 MB/s, cost‑effective for capacity‑oriented storage.

SAS – 12 Gbps, dual‑port, enterprise features.

NVMe – PCIe 4 × 4 ≈ 7 GB/s, ultra‑low latency; ideal for performance‑critical workloads.

Stripe Size Selection

Stripe (chunk) size influences sequential vs random performance.

Large sequential workloads (video, large files): 256 KB – 1 MB.

Random small‑block workloads (databases): 64 KB – 128 KB.

Check current size with mdadm --detail /dev/md0 | grep "Chunk Size" and set during creation with --chunk=256 (value in KB).

Typical recommendations:

Databases: 64 KB or 128 KB.

File servers: 256 KB.

Virtualization: 128 KB.

Video streaming: 1 MB.

Best Practices

Use identical model and capacity disks to avoid performance imbalance.

Reserve hot‑spare disks (software: mdadm --add /dev/md0 --spare /dev/sdf).

Enable write‑back cache only when BBU/Capacitor is healthy (StorCLI: wrcache=WB).

Schedule Patrol Read / SMART checks (StorCLI patrolread=on, smartd on Linux).

Align partitions to stripe size (e.g., parted /dev/md0 mkpart primary 1MiB 100% then parted /dev/md0 align-check optimal 1).

Filesystem creation with stripe parameters :

XFS: mkfs.xfs -d su=256k,sw=3 /dev/md0p1 EXT4: mkfs.ext4 -E stride=64,stripe-width=192 /dev/md0p1 Tune rebuild speed on Linux via /proc/sys/dev/raid/speed_limit_min and speed_limit_max.

Hardware RAID cache policies – Write Back (WB) + Read Ahead (RA) for databases; fall back to Write Through (WT) if BBU fails.

Encryption – Use LUKS on top of the RAID device when data‑at‑rest protection is required.

Backup strategy – RAID is not a backup. Follow the 3‑2‑1 rule (3 copies, 2 media types, 1 off‑site).

Troubleshooting

Degraded array : Identify failed disk with cat /proc/mdstat and mdadm --detail, remove it ( mdadm --manage /dev/md0 --remove /dev/sdd), replace, and add back ( mdadm --manage /dev/md0 --add /dev/sdd). Monitor rebuild via watch cat /proc/mdstat.

Complete array failure : Attempt manual assemble ( mdadm --assemble --scan or mdadm --assemble /dev/md0 /dev/sdb /dev/sdc ... --force). If unsuccessful, restore from backup.

Rebuild failure (RAID 5/6) : Stop I/O, use ddrescue to clone the failing disk, then replace and rebuild. Prefer RAID 6 or RAID 10 for large mechanical disks.

Controller battery/BBU failure : Verify with storcli /c0 bbu show. Replace battery or capacitor; avoid write‑back cache until repaired.

2025 RAID Trends

NVMe RAID : Direct‑attach NVMe SSDs provide multi‑TB/s bandwidth. Intel VROC and AMD RAIDXpert enable RAID‑like protection without a separate card.

Distributed storage superseding RAID : Ceph, MinIO, HDFS provide erasure coding and self‑healing across nodes, reducing the need for traditional RAID in large clusters.

ZFS and Btrfs : Built‑in RAID‑Z, RAID‑M, checksumming, snapshots, compression, and self‑repair. Ideal for new deployments where data integrity is paramount.

Cloud‑native storage : Kubernetes StorageClasses abstract underlying RAID or cloud volumes (e.g., AWS io2 SSD with guaranteed IOPS). The application sees a PersistentVolume; RAID is managed by the provider.

Conclusion

RAID remains a fundamental building block for data reliability and performance, but it must be chosen, configured, and maintained carefully. Key take‑aways:

Never treat RAID as a backup – implement a robust 3‑2‑1 backup strategy.

Deploy continuous monitoring (email/Prometheus) for degradation, failures, and rebuild progress.

Keep hot‑spare disks ready to reduce mean‑time‑to‑repair.

Avoid RAID 5 on large mechanical disks; prefer RAID 6 or RAID 10 for critical workloads.

Use enterprise‑grade SSDs for performance‑critical systems.

Regularly test recovery procedures and backup restores.

Following these practices will help you avoid costly data‑loss incidents and keep your storage infrastructure reliable and performant.

MonitoringPerformanceLinuxStorageBackupRAID
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.