When Redundancy Stops Being Redundant — The Moment Two Drives Fall
It always seems to happen at the worst time—during maintenance, a firmware update, or a late-night rebuild.
One drive fails and you relax: RAID 6 can handle that. Then another drops, and the array freezes.
That quiet moment isn’t just silence; it’s the point where safety becomes fragility.
We’ve seen this before, and it’s recoverable—if you act in the right order.
What You See
- RAID utility reports Degraded or Failed – 2 Members Offline.
- Rebuild may start, stall, or never begin.
- Controller logs show Foreign Config Detected, Offline Member, or Critical VD.
- OS sees no volume or a blank partition table.
- LEDs show activity on only a subset of drives.
Why It Happens
- The second “failure” is often a latent sector error uncovered mid-rebuild.
- Cache/NVRAM mismatch marks a good drive offline to preserve parity integrity.
- Firmware or power interruption desynchronizes the parity epoch.
- Re-seating or slot changes alter internal order signatures.
- Once parity math diverges, the controller locks the array to prevent damage.
What NOT To Do
- Do not import or force online members until all metadata is imaged.
- Do not clear foreign configs; that erases identifiers.
- Do not swap ports/cables to “test” (mapping may change).
- Do not run CHKDSK/fsck; they write to unstable sectors.
- Do not trust backups made during degradation.
What You CAN Do
- Image every member before controller actions.
- Preserve controller logs and NVRAM snapshots.
- Document slot order, serials, firmware revisions.
- Check power/thermal history for cascading drops.
- Engage a recovery engineer early—untouched data is everything.
Diagnostic Overview
- Array Type: RAID 6 — Dual Parity Set
- Controller State: Two Members Offline / Virtual Disk Inaccessible
- Likely Cause: Sequential Drive Failure or Parity Desync During Rebuild
- Do NOT: Import or Rebuild Before Imaging and Metadata Capture
- Recommended Action: Clone Members, Preserve Logs and NVRAM, Engage ADR Triage Flow