Add a description of how to possibly recover a RAID set in the

event of a multiple disk failure.
2002-01-20 02:30:11 +00:00 · 2002-01-20 02:30:11 +00:00 · c4aed2da0e
commit c4aed2da0e
parent 15d16b2223
1 changed files with 88 additions and 1 deletions
--- a/sbin/raidctl/raidctl.8
+++ b/sbin/raidctl/raidctl.8
@ -1,4 +1,4 @@
-.\"     $NetBSD: raidctl.8,v 1.26 2001/11/16 11:06:46 wiz Exp $
+.\"     $NetBSD: raidctl.8,v 1.27 2002/01/20 02:30:11 oster Exp $
 .\"
 .\" Copyright (c) 1998 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@ -962,6 +962,93 @@ raidctl -F component1 raid0
 at which point the data missing from
 .Sq component1
 would be reconstructed onto /dev/sd3e.
 .Pp
 When more than one component is marked as
 .Sq failed
 due to a non-component hardware failure (e.g. loss of power to two
 components, adapter problems, termination problems, or cabling issues) it
 is quite possible to recover the data on the RAID set.  The first
 thing to be aware of is that the first disk to fail will almost certainly
 be out-of-sync with the remainder of the array.  If any IO was
 performed between the time the first component is considered
 .Sq failed
 and when the second component is considered
 .Sq failed ,
 then the first component to fail will
 .Ar not
 contain correct data, and should be ignored.  When the second
 component is marked as failed, however, the RAID device will
 (currently) panic the system.  At this point the data on the RAID set
 (not including the first failed component) is still self consistent,
 and will be in no worse state of repair than had the power gone out in
 the middle of a write to a filesystem on a non-RAID device.
 The problem, however, is that the component labels may now have 3
 different 'modification counters' (one value on the first component
 that failed, one value on the second component that failed, and a
 third value on the remaining components).  In such a situation, the
 RAID set will not autoconfigure, and can only be forcibly re-configured
 with the 
 .Fl C
 option.  To recover the RAID set, one must first remedy whatever physical
 problem caused the multiple-component failure.  After that is done,
 the RAID set can be restored by forcibly configuring the raid set
 .Ar without
 the component that failed first.  For example, if /dev/sd1e and
 /dev/sd2e fail (in that order) in a RAID set of the following
 configuration:
 .Bd -literal -offset indent
 START array
 1 4 0
 START drives
 /dev/sd1e
 /dev/sd2e
 /dev/sd3e
 /dev/sd4e
 START layout
 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
 64 1 1 5
 START queue
 fifo 100
 .Ed
 .Pp
 then the following configuration (say "recover_raid0.conf")
 .Bd -literal -offset indent
 START array
 1 4 0
 START drives
 /dev/sd6e
 /dev/sd2e
 /dev/sd3e
 /dev/sd4e
 START layout
 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
 64 1 1 5
 START queue
 fifo 100
 .Ed
 .Pp
 (where /dev/sd6e has no physical device) can be used with
 .Bd -literal -offset indent
 raidctl -C recover_raid0.conf raid0
 .Ed
 .Pp
 to force the configuration of raid0.  A
 .Bd -literal -offset indent
 raidctl -I 12345 raid0
 .Ed
 .Pp
 will be required in order to synchronize the component labels.
 At this point the filesystems on the RAID set can then be checked and
 corrected.  To complete the re-construction of the RAID set, 
 /dev/sd1e is simply hot-added back into the array, and reconstructed
 as described earlier.
 .Ss RAID on RAID
 RAID sets can be layered to create more complex and much larger RAID
 sets.  A RAID 0 set, for example, could be constructed from four RAID