When the outage ends but your array doesn’t come back clean
After a power loss, you expect your RAID 60 to come back online exactly as before.
Instead you see:
- multiple “Degraded” flags,
- maybe one group “Optimal” and the other “Degraded,”
- or the RAID 60 volume returning with warnings and strange behavior.
Drives look mostly “Good,” yet the controller clearly no longer trusts the array.
This is not just a nuisance state — it is your controller telling you the groups no longer agree on write history.
1. What’s Actually Going Wrong (Plain English)
RAID 60 depends on two RAID-6 groups staying in lockstep about:
- parity epochs
- stripe ordering
- commit sequences
- controller-side metadata
During a power event, three things commonly go wrong:
- cached writes are not fully flushed,
- NVRAM and on-disk headers get out of sync,
- one group finishes a write cycle that the other never completes.
After reboot, the controller sees:
- mismatched epochs between groups,
- headers that don’t fully agree on geometry or sequence,
- incomplete or partially committed parity.
It responds by dropping the array into Degraded state to avoid blindly trusting mismatched parity math.
2. How This Failure Shows Up
Typical signs after power loss:
- Event logs mention:
- “unclean shutdown,”
- “cache not flushed,”
- “consistency check required,”
- or “parity inconsistent.”
- One RAID-6 group might show:
- Optimal / OK
- The other group might show:
- Degraded,
- foreign,
- or “requires initialization/verify.”
- RAID 60 volume:
- comes up as Degraded,
- may be slow or unstable,
- mounts but throws I/O or filesystem errors under load.
The key signal: no obvious drive failures, but the controller refuses to declare the array fully healthy.
3. What This Means for Your Data
Degraded-after-power-loss in RAID 60 usually means:
- At least one group is out-of-step with the other, and
- The controller knows it cannot safely treat both parity domains as synchronized.
Your data may still be there, but:
- consistency is no longer guaranteed,
- reads may be coming from a mixture of old and new parity states,
- any attempted repair or rebuild can lock in the wrong epoch permanently.
If you keep running heavy workloads on a degraded RAID 60 after power loss, you risk:
- silent corruption,
- directories that later appear empty,
- and failures that no simple rebuild can undo.
4. What NOT To Do
Do not respond to this like a simple “one drive failed” RAID case.
Avoid:
- forcing a rebuild immediately,
- running “verify and fix” or “consistency check with repair,”
- resetting or reinitializing the array,
- importing or clearing foreign configs blindly,
- running filesystem repair on the live array,
- swapping additional drives “just to be safe.”
Each of these can:
- overwrite valid parity,
- cement the wrong group as “authoritative,”
- or spread localized corruption across both parity domains.
5. Correct Triage After Power Loss
Step 1 — Freeze the state
- Disable any scheduled consistency checks that attempt “auto-fix.”
- Stop further writes if at all possible.
Step 2 — Clone every drive
- Make bit-level images of all members, not just disks flagged as Degraded.
- Preserve slot → serial → WWN mapping.
Step 3 — Analyze each RAID-6 group offline
- Reconstruct Group A and Group B virtually.
- Compare:
- epoch counters,
- parity alignment,
- stripe ordering,
- group-level metadata.
Step 4 — Identify which group advanced further
- The group with the latest coherent parity epoch is usually authoritative.
- The other may represent a rolled-back or partially updated state.
Step 5 — Rebuild RAID 60 in a virtual model
- Once both groups are understood, assemble the RAID 60 layout offline.
- Check that filesystem metadata (superblocks, MFT/inodes, journals) is consistent.
Step 6 — Recover onto clean storage
- Mount the reconstructed array in a controlled environment.
- Copy data to a new, known-good platform.
Diagnostic Overview
- Device: RAID 60 array (two RAID-6 groups striped under RAID-0)
- Observed State: Array shows Degraded after power loss; groups appear present but not fully trusted
- Likely Cause: Cache/NVRAM not fully flushed, resulting in parity-epoch and metadata mismatches between groups
- Do NOT: Force rebuild, run verify-and-fix, initialize, or clear/import foreign configs without imaging
- Recommended Action: Clone all disks, reconstruct each RAID-6 group offline, determine authoritative epoch, then virtualize RAID 60 for safe recovery
RAID Triage Center – RAID 60 Triage – RAID 60 Technical Notes