1419 lines
43 KiB
Groff
1419 lines
43 KiB
Groff
.\" $NetBSD: raidctl.8,v 1.28 2002/01/21 11:40:20 wiz Exp $
|
|
.\"
|
|
.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
.\" by Greg Oster
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
.\" must display the following acknowledgement:
|
|
.\" This product includes software developed by the NetBSD
|
|
.\" Foundation, Inc. and its contributors.
|
|
.\" 4. Neither the name of The NetBSD Foundation nor the names of its
|
|
.\" contributors may be used to endorse or promote products derived
|
|
.\" from this software without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.\"
|
|
.\" Copyright (c) 1995 Carnegie-Mellon University.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Author: Mark Holland
|
|
.\"
|
|
.\" Permission to use, copy, modify and distribute this software and
|
|
.\" its documentation is hereby granted, provided that both the copyright
|
|
.\" notice and this permission notice appear in all copies of the
|
|
.\" software, derivative works or modified versions, and any portions
|
|
.\" thereof, and that both notices appear in supporting documentation.
|
|
.\"
|
|
.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
.\"
|
|
.\" Carnegie Mellon requests users of this software to return to
|
|
.\"
|
|
.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
.\" School of Computer Science
|
|
.\" Carnegie Mellon University
|
|
.\" Pittsburgh PA 15213-3890
|
|
.\"
|
|
.\" any improvements or extensions that they make and grant Carnegie the
|
|
.\" rights to redistribute these changes.
|
|
.\"
|
|
.Dd July 10, 2001
|
|
.Dt RAIDCTL 8
|
|
.Os
|
|
.Sh NAME
|
|
.Nm raidctl
|
|
.Nd configuration utility for the RAIDframe disk driver
|
|
.Sh SYNOPSIS
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl a Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl A Op yes | no | root
|
|
.Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl B Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl c Ar config_file Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl C Ar config_file Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl f Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl F Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl g Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl G Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl i Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl I Ar serial_number Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl p Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl P Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl r Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl R Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl s Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl S Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl u Ar dev
|
|
.Sh DESCRIPTION
|
|
.Nm ""
|
|
is the user-land control program for
|
|
.Xr raid 4 ,
|
|
the RAIDframe disk device.
|
|
.Nm ""
|
|
is primarily used to dynamically configure and unconfigure RAIDframe disk
|
|
devices. For more information about the RAIDframe disk device, see
|
|
.Xr raid 4 .
|
|
.Pp
|
|
This document assumes the reader has at least rudimentary knowledge of
|
|
RAID and RAID concepts.
|
|
.Pp
|
|
The command-line options for
|
|
.Nm
|
|
are as follows:
|
|
.Bl -tag -width indent
|
|
.It Fl a Ar component Ar dev
|
|
Add
|
|
.Ar component
|
|
as a hot spare for the device
|
|
.Ar dev .
|
|
.It Fl A Ic yes Ar dev
|
|
Make the RAID set auto-configurable. The RAID set will be
|
|
automatically configured at boot
|
|
.Ar before
|
|
the root file system is
|
|
mounted. Note that all components of the set must be of type RAID in the
|
|
disklabel.
|
|
.It Fl A Ic no Ar dev
|
|
Turn off auto-configuration for the RAID set.
|
|
.It Fl A Ic root Ar dev
|
|
Make the RAID set auto-configurable, and also mark the set as being
|
|
eligible to be the root partition. A RAID set configured this way
|
|
will
|
|
.Ar override
|
|
the use of the boot disk as the root device. All components of the
|
|
set must be of type RAID in the disklabel. Note that the kernel being
|
|
booted must currently reside on a non-RAID set.
|
|
.It Fl B Ar dev
|
|
Initiate a copyback of reconstructed data from a spare disk to
|
|
its original disk. This is performed after a component has failed,
|
|
and the failed drive has been reconstructed onto a spare drive.
|
|
.It Fl c Ar config_file Ar dev
|
|
Configure the RAIDframe device
|
|
.Ar dev
|
|
according to the configuration given in
|
|
.Ar config_file .
|
|
A description of the contents of
|
|
.Ar config_file
|
|
is given later.
|
|
.It Fl C Ar config_file Ar dev
|
|
As for
|
|
.Ar -c ,
|
|
but forces the configuration to take place. This is required the
|
|
first time a RAID set is configured.
|
|
.It Fl f Ar component Ar dev
|
|
This marks the specified
|
|
.Ar component
|
|
as having failed, but does not initiate a reconstruction of that
|
|
component.
|
|
.It Fl F Ar component Ar dev
|
|
Fails the specified
|
|
.Ar component
|
|
of the device, and immediately begin a reconstruction of the failed
|
|
disk onto an available hot spare. This is one of the mechanisms used to start
|
|
the reconstruction process if a component does have a hardware failure.
|
|
.It Fl g Ar component Ar dev
|
|
Get the component label for the specified component.
|
|
.It Fl G Ar dev
|
|
Generate the configuration of the RAIDframe device in a format suitable for
|
|
use with
|
|
.Nm
|
|
.Fl c
|
|
or
|
|
.Fl C .
|
|
.It Fl i Ar dev
|
|
Initialize the RAID device. In particular, (re-write) the parity on
|
|
the selected device. This
|
|
.Ar MUST
|
|
be done for
|
|
.Ar all
|
|
RAID sets before the RAID device is labeled and before
|
|
file systems are created on the RAID device.
|
|
.It Fl I Ar serial_number Ar dev
|
|
Initialize the component labels on each component of the device.
|
|
.Ar serial_number
|
|
is used as one of the keys in determining whether a
|
|
particular set of components belong to the same RAID set. While not
|
|
strictly enforced, different serial numbers should be used for
|
|
different RAID sets. This step
|
|
.Ar MUST
|
|
be performed when a new RAID set is created.
|
|
.It Fl p Ar dev
|
|
Check the status of the parity on the RAID set. Displays a status
|
|
message, and returns successfully if the parity is up-to-date.
|
|
.It Fl P Ar dev
|
|
Check the status of the parity on the RAID set, and initialize
|
|
(re-write) the parity if the parity is not known to be up-to-date.
|
|
This is normally used after a system crash (and before a
|
|
.Xr fsck 8 )
|
|
to ensure the integrity of the parity.
|
|
.It Fl r Ar component Ar dev
|
|
Remove the spare disk specified by
|
|
.Ar component
|
|
from the set of available spare components.
|
|
.It Fl R Ar component Ar dev
|
|
Fails the specified
|
|
.Ar component ,
|
|
if necessary, and immediately begins a reconstruction back to
|
|
.Ar component .
|
|
This is useful for reconstructing back onto a component after
|
|
it has been replaced following a failure.
|
|
.It Fl s Ar dev
|
|
Display the status of the RAIDframe device for each of the components
|
|
and spares.
|
|
.It Fl S Ar dev
|
|
Check the status of parity re-writing, component reconstruction, and
|
|
component copyback. The output indicates the amount of progress
|
|
achieved in each of these areas.
|
|
.It Fl u Ar dev
|
|
Unconfigure the RAIDframe device.
|
|
.It Fl v
|
|
Be more verbose. For operations such as reconstructions, parity
|
|
re-writing, and copybacks, provide a progress indicator.
|
|
.El
|
|
.Pp
|
|
The device used by
|
|
.Nm
|
|
is specified by
|
|
.Ar dev .
|
|
.Ar dev
|
|
may be either the full name of the device, e.g. /dev/rraid0d,
|
|
for the i386 architecture, and /dev/rraid0c
|
|
for all others, or just simply raid0 (for /dev/rraid0d).
|
|
.Ss Configuration file
|
|
The format of the configuration file is complex, and
|
|
only an abbreviated treatment is given here. In the configuration
|
|
files, a
|
|
.Sq #
|
|
indicates the beginning of a comment.
|
|
.Pp
|
|
There are 4 required sections of a configuration file, and 2
|
|
optional sections. Each section begins with a
|
|
.Sq START ,
|
|
followed by
|
|
the section name, and the configuration parameters associated with that
|
|
section. The first section is the
|
|
.Sq array
|
|
section, and it specifies
|
|
the number of rows, columns, and spare disks in the RAID set. For
|
|
example:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 3 0
|
|
.Ed
|
|
.Pp
|
|
indicates an array with 1 row, 3 columns, and 0 spare disks. Note
|
|
that although multi-dimensional arrays may be specified, they are
|
|
.Ar NOT
|
|
supported in the driver.
|
|
.Pp
|
|
The second section, the
|
|
.Sq disks
|
|
section, specifies the actual
|
|
components of the device. For example:
|
|
.Bd -literal -offset indent
|
|
START disks
|
|
/dev/sd0e
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
.Ed
|
|
.Pp
|
|
specifies the three component disks to be used in the RAID device. If
|
|
any of the specified drives cannot be found when the RAID device is
|
|
configured, then they will be marked as
|
|
.Sq failed ,
|
|
and the system will
|
|
operate in degraded mode. Note that it is
|
|
.Ar imperative
|
|
that the order of the components in the configuration file does not
|
|
change between configurations of a RAID device. Changing the order
|
|
of the components will result in data loss if the set is configured
|
|
with the
|
|
.Fl C
|
|
option. In normal circumstances, the RAID set will not configure if
|
|
only
|
|
.Fl c
|
|
is specified, and the components are out-of-order.
|
|
.Pp
|
|
The next section, which is the
|
|
.Sq spare
|
|
section, is optional, and, if
|
|
present, specifies the devices to be used as
|
|
.Sq hot spares
|
|
-- devices
|
|
which are on-line, but are not actively used by the RAID driver unless
|
|
one of the main components fail. A simple
|
|
.Sq spare
|
|
section might be:
|
|
.Bd -literal -offset indent
|
|
START spare
|
|
/dev/sd3e
|
|
.Ed
|
|
.Pp
|
|
for a configuration with a single spare component. If no spare drives
|
|
are to be used in the configuration, then the
|
|
.Sq spare
|
|
section may be omitted.
|
|
.Pp
|
|
The next section is the
|
|
.Sq layout
|
|
section. This section describes the
|
|
general layout parameters for the RAID device, and provides such
|
|
information as sectors per stripe unit, stripe units per parity unit,
|
|
stripe units per reconstruction unit, and the parity configuration to
|
|
use. This section might look like:
|
|
.Bd -literal -offset indent
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
|
|
32 1 1 5
|
|
.Ed
|
|
.Pp
|
|
The sectors per stripe unit specifies, in blocks, the interleave
|
|
factor; i.e. the number of contiguous sectors to be written to each
|
|
component for a single stripe. Appropriate selection of this value
|
|
(32 in this example) is the subject of much research in RAID
|
|
architectures. The stripe units per parity unit and
|
|
stripe units per reconstruction unit are normally each set to 1.
|
|
While certain values above 1 are permitted, a discussion of valid
|
|
values and the consequences of using anything other than 1 are outside
|
|
the scope of this document. The last value in this section (5 in this
|
|
example) indicates the parity configuration desired. Valid entries
|
|
include:
|
|
.Bl -tag -width inde
|
|
.It 0
|
|
RAID level 0. No parity, only simple striping.
|
|
.It 1
|
|
RAID level 1. Mirroring. The parity is the mirror.
|
|
.It 4
|
|
RAID level 4. Striping across components, with parity stored on the
|
|
last component.
|
|
.It 5
|
|
RAID level 5. Striping across components, parity distributed across
|
|
all components.
|
|
.El
|
|
.Pp
|
|
There are other valid entries here, including those for Even-Odd
|
|
parity, RAID level 5 with rotated sparing, Chained declustering,
|
|
and Interleaved declustering, but as of this writing the code for
|
|
those parity operations has not been tested with
|
|
.Nx .
|
|
.Pp
|
|
The next required section is the
|
|
.Sq queue
|
|
section. This is most often
|
|
specified as:
|
|
.Bd -literal -offset indent
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
where the queuing method is specified as fifo (first-in, first-out),
|
|
and the size of the per-component queue is limited to 100 requests.
|
|
Other queuing methods may also be specified, but a discussion of them
|
|
is beyond the scope of this document.
|
|
.Pp
|
|
The final section, the
|
|
.Sq debug
|
|
section, is optional. For more details
|
|
on this the reader is referred to the RAIDframe documentation
|
|
discussed in the
|
|
.Sx HISTORY
|
|
section.
|
|
.Pp
|
|
See
|
|
.Sx EXAMPLES
|
|
for a more complete configuration file example.
|
|
.Sh FILES
|
|
.Bl -tag -width /dev/XXrXraidX -compact
|
|
.It Pa /dev/{,r}raid*
|
|
.Cm raid
|
|
device special files.
|
|
.El
|
|
.Sh EXAMPLES
|
|
It is highly recommended that before using the RAID driver for real
|
|
file systems that the system administrator(s) become quite familiar
|
|
with the use of
|
|
.Nm "" ,
|
|
and that they understand how the component reconstruction process
|
|
works. The examples in this section will focus on configuring a
|
|
number of different RAID sets of varying degrees of redundancy.
|
|
By working through these examples, administrators should be able to
|
|
develop a good feel for how to configure a RAID set, and how to
|
|
initiate reconstruction of failed components.
|
|
.Pp
|
|
In the following examples
|
|
.Sq raid0
|
|
will be used to denote the RAID device. Depending on the
|
|
architecture,
|
|
.Sq /dev/rraid0c
|
|
or
|
|
.Sq /dev/rraid0d
|
|
may be used in place of
|
|
.Sq raid0 .
|
|
.Ss Initialization and Configuration
|
|
The initial step in configuring a RAID set is to identify the components
|
|
that will be used in the RAID set. All components should be the same
|
|
size. Each component should have a disklabel type of
|
|
.Dv FS_RAID ,
|
|
and a typical disklabel entry for a RAID component
|
|
might look like:
|
|
.Bd -literal -offset indent
|
|
f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
|
|
.Ed
|
|
.Pp
|
|
While
|
|
.Dv FS_BSDFFS
|
|
will also work as the component type, the type
|
|
.Dv FS_RAID
|
|
is preferred for RAIDframe use, as it is required for features such as
|
|
auto-configuration. As part of the initial configuration of each RAID
|
|
set, each component will be given a
|
|
.Sq component label .
|
|
A
|
|
.Sq component label
|
|
contains important information about the component, including a
|
|
user-specified serial number, the row and column of that component in
|
|
the RAID set, the redundancy level of the RAID set, a 'modification
|
|
counter', and whether the parity information (if any) on that
|
|
component is known to be correct. Component labels are an integral
|
|
part of the RAID set, since they are used to ensure that components
|
|
are configured in the correct order, and used to keep track of other
|
|
vital information about the RAID set. Component labels are also
|
|
required for the auto-detection and auto-configuration of RAID sets at
|
|
boot time. For a component label to be considered valid, that
|
|
particular component label must be in agreement with the other
|
|
component labels in the set. For example, the serial number,
|
|
.Sq modification counter ,
|
|
number of rows and number of columns must all
|
|
be in agreement. If any of these are different, then the component is
|
|
not considered to be part of the set. See
|
|
.Xr raid 4
|
|
for more information about component labels.
|
|
.Pp
|
|
Once the components have been identified, and the disks have
|
|
appropriate labels,
|
|
.Nm ""
|
|
is then used to configure the
|
|
.Xr raid 4
|
|
device. To configure the device, a configuration
|
|
file which looks something like:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 3 1
|
|
|
|
START disks
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
|
|
START spare
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
32 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
is created in a file. The above configuration file specifies a RAID 5
|
|
set consisting of the components /dev/sd1e, /dev/sd2e, and /dev/sd3e,
|
|
with /dev/sd4e available as a
|
|
.Sq hot spare
|
|
in case one of
|
|
the three main drives should fail. A RAID 0 set would be specified in
|
|
a similar way:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 4 0
|
|
|
|
START disks
|
|
/dev/sd10e
|
|
/dev/sd11e
|
|
/dev/sd12e
|
|
/dev/sd13e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
|
|
64 1 1 0
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
In this case, devices /dev/sd10e, /dev/sd11e, /dev/sd12e, and /dev/sd13e
|
|
are the components that make up this RAID set. Note that there are no
|
|
hot spares for a RAID 0 set, since there is no way to recover data if
|
|
any of the components fail.
|
|
.Pp
|
|
For a RAID 1 (mirror) set, the following configuration might be used:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 2 0
|
|
|
|
START disks
|
|
/dev/sd20e
|
|
/dev/sd21e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
|
|
128 1 1 1
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
In this case, /dev/sd20e and /dev/sd21e are the two components of the
|
|
mirror set. While no hot spares have been specified in this
|
|
configuration, they easily could be, just as they were specified in
|
|
the RAID 5 case above. Note as well that RAID 1 sets are currently
|
|
limited to only 2 components. At present, n-way mirroring is not
|
|
possible.
|
|
.Pp
|
|
The first time a RAID set is configured, the
|
|
.Fl C
|
|
option must be used:
|
|
.Bd -literal -offset indent
|
|
raidctl -C raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
where
|
|
.Sq raid0.conf
|
|
is the name of the RAID configuration file. The
|
|
.Fl C
|
|
forces the configuration to succeed, even if any of the component
|
|
labels are incorrect. The
|
|
.Fl C
|
|
option should not be used lightly in
|
|
situations other than initial configurations, as if
|
|
the system is refusing to configure a RAID set, there is probably a
|
|
very good reason for it. After the initial configuration is done (and
|
|
appropriate component labels are added with the
|
|
.Fl I
|
|
option) then raid0 can be configured normally with:
|
|
.Bd -literal -offset indent
|
|
raidctl -c raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
When the RAID set is configured for the first time, it is
|
|
necessary to initialize the component labels, and to initialize the
|
|
parity on the RAID set. Initializing the component labels is done with:
|
|
.Bd -literal -offset indent
|
|
raidctl -I 112341 raid0
|
|
.Ed
|
|
.Pp
|
|
where
|
|
.Sq 112341
|
|
is a user-specified serial number for the RAID set. This
|
|
initialization step is
|
|
.Ar required
|
|
for all RAID sets. As well, using different
|
|
serial numbers between RAID sets is
|
|
.Ar strongly encouraged ,
|
|
as using the same serial number for all RAID sets will only serve to
|
|
decrease the usefulness of the component label checking.
|
|
.Pp
|
|
Initializing the RAID set is done via the
|
|
.Fl i
|
|
option. This initialization
|
|
.Ar MUST
|
|
be done for
|
|
.Ar all
|
|
RAID sets, since among other things it verifies that the parity (if
|
|
any) on the RAID set is correct. Since this initialization may be
|
|
quite time-consuming, the
|
|
.Fl v
|
|
option may be also used in conjunction with
|
|
.Fl i :
|
|
.Bd -literal -offset indent
|
|
raidctl -iv raid0
|
|
.Ed
|
|
.Pp
|
|
This will give more verbose output on the
|
|
status of the initialization:
|
|
.Bd -literal -offset indent
|
|
Initiating re-write of parity
|
|
Parity Re-write status:
|
|
10% |**** | ETA: 06:03 /
|
|
.Ed
|
|
.Pp
|
|
The output provides a
|
|
.Sq Percent Complete
|
|
in both a numeric and graphical format, as well as an estimated time
|
|
to completion of the operation.
|
|
.Pp
|
|
Since it is the parity that provides the
|
|
.Sq redundancy
|
|
part of RAID, it is critical that the parity is correct
|
|
as much as possible. If the parity is not correct, then there is no
|
|
guarantee that data will not be lost if a component fails.
|
|
.Pp
|
|
Once the parity is known to be correct,
|
|
it is then safe to perform
|
|
.Xr disklabel 8 ,
|
|
.Xr newfs 8 ,
|
|
or
|
|
.Xr fsck 8
|
|
on the device or its file systems, and then to mount the file systems
|
|
for use.
|
|
.Pp
|
|
Under certain circumstances (e.g. the additional component has not
|
|
arrived, or data is being migrated off of a disk destined to become a
|
|
component) it may be desirable to to configure a RAID 1 set with only
|
|
a single component. This can be achieved by configuring the set with
|
|
a physically existing component (as either the first or second
|
|
component) and with a
|
|
.Sq fake
|
|
component. In the following:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 2 0
|
|
|
|
START disks
|
|
/dev/sd6e
|
|
/dev/sd0e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
|
|
128 1 1 1
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
/dev/sd0e is the real component, and will be the second disk of a RAID 1
|
|
set. The component /dev/sd6e, which must exist, but have no physical
|
|
device associated with it, is simply used as a placeholder.
|
|
Configuration (using
|
|
.Fl C
|
|
and
|
|
.Fl I Ar 12345
|
|
as above) proceeds normally, but initialization of the RAID set will
|
|
have to wait until all physical components are present. After
|
|
configuration, this set can be used normally, but will be operating
|
|
in degraded mode. Once a second physical component is obtained, it
|
|
can be hot-added, the existing data mirrored, and normal operation
|
|
resumed.
|
|
.Ss Maintenance of the RAID set
|
|
After the parity has been initialized for the first time, the command:
|
|
.Bd -literal -offset indent
|
|
raidctl -p raid0
|
|
.Ed
|
|
.Pp
|
|
can be used to check the current status of the parity. To check the
|
|
parity and rebuild it necessary (for example, after an unclean
|
|
shutdown) the command:
|
|
.Bd -literal -offset indent
|
|
raidctl -P raid0
|
|
.Ed
|
|
.Pp
|
|
is used. Note that re-writing the parity can be done while
|
|
other operations on the RAID set are taking place (e.g. while doing a
|
|
.Xr fsck 8
|
|
on a file system on the RAID set). However: for maximum effectiveness
|
|
of the RAID set, the parity should be known to be correct before any
|
|
data on the set is modified.
|
|
.Pp
|
|
To see how the RAID set is doing, the following command can be used to
|
|
show the RAID set's status:
|
|
.Bd -literal -offset indent
|
|
raidctl -s raid0
|
|
.Ed
|
|
.Pp
|
|
The output will look something like:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
Component label for /dev/sd1e:
|
|
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Component label for /dev/sd2e:
|
|
Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Component label for /dev/sd3e:
|
|
Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Parity status: clean
|
|
Reconstruction is 100% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
This indicates that all is well with the RAID set. Of importance here
|
|
are the component lines which read
|
|
.Sq optimal ,
|
|
and the
|
|
.Sq Parity status
|
|
line which indicates that the parity is up-to-date. Note that if
|
|
there are file systems open on the RAID set, the individual components
|
|
will not be
|
|
.Sq clean
|
|
but the set as a whole can still be clean.
|
|
.Pp
|
|
To check the component label of /dev/sd1e, the following is used:
|
|
.Bd -literal -offset indent
|
|
raidctl -g /dev/sd1e raid0
|
|
.Ed
|
|
.Pp
|
|
The output of this command will look something like:
|
|
.Bd -literal -offset indent
|
|
Component label for /dev/sd1e:
|
|
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
.Ed
|
|
.Ss Dealing with Component Failures
|
|
If for some reason
|
|
(perhaps to test reconstruction) it is necessary to pretend a drive
|
|
has failed, the following will perform that function:
|
|
.Bd -literal -offset indent
|
|
raidctl -f /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
The system will then be performing all operations in degraded mode,
|
|
where missing data is re-computed from existing data and the parity.
|
|
In this case, obtaining the status of raid0 will return (in part):
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
Note that with the use of
|
|
.Fl f
|
|
a reconstruction has not been started. To both fail the disk and
|
|
start a reconstruction, the
|
|
.Fl F
|
|
option must be used:
|
|
.Bd -literal -offset indent
|
|
raidctl -F /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Fl f
|
|
option may be used first, and then the
|
|
.Fl F
|
|
option used later, on the same disk, if desired.
|
|
Immediately after the reconstruction is started, the status will report:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: reconstructing
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: used_spare
|
|
[...]
|
|
Parity status: clean
|
|
Reconstruction is 10% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
This indicates that a reconstruction is in progress. To find out how
|
|
the reconstruction is progressing the
|
|
.Fl S
|
|
option may be used. This will indicate the progress in terms of the
|
|
percentage of the reconstruction that is completed. When the
|
|
reconstruction is finished the
|
|
.Fl s
|
|
option will show:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: spared
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: used_spare
|
|
[...]
|
|
Parity status: clean
|
|
Reconstruction is 100% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
At this point there are at least two options. First, if /dev/sd2e is
|
|
known to be good (i.e. the failure was either caused by
|
|
.Fl f
|
|
or
|
|
.Fl F ,
|
|
or the failed disk was replaced), then a copyback of the data can
|
|
be initiated with the
|
|
.Fl B
|
|
option. In this example, this would copy the entire contents of
|
|
/dev/sd4e to /dev/sd2e. Once the copyback procedure is complete, the
|
|
status of the device would be (in part):
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
and the system is back to normal operation.
|
|
.Pp
|
|
The second option after the reconstruction is to simply use /dev/sd4e
|
|
in place of /dev/sd2e in the configuration file. For example, the
|
|
configuration file (in part) might now look like:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 3 0
|
|
|
|
START drives
|
|
/dev/sd1e
|
|
/dev/sd4e
|
|
/dev/sd3e
|
|
.Ed
|
|
.Pp
|
|
This can be done as /dev/sd4e is completely interchangeable with
|
|
/dev/sd2e at this point. Note that extreme care must be taken when
|
|
changing the order of the drives in a configuration. This is one of
|
|
the few instances where the devices and/or their orderings can be
|
|
changed without loss of data! In general, the ordering of components
|
|
in a configuration file should
|
|
.Ar never
|
|
be changed.
|
|
.Pp
|
|
If a component fails and there are no hot spares
|
|
available on-line, the status of the RAID set might (in part) look like:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
In this case there are a number of options. The first option is to add a hot
|
|
spare using:
|
|
.Bd -literal -offset indent
|
|
raidctl -a /dev/sd4e raid0
|
|
.Ed
|
|
.Pp
|
|
After the hot add, the status would then be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
Reconstruction could then take place using
|
|
.Fl F
|
|
as describe above.
|
|
.Pp
|
|
A second option is to rebuild directly onto /dev/sd2e. Once the disk
|
|
containing /dev/sd2e has been replaced, one can simply use:
|
|
.Bd -literal -offset indent
|
|
raidctl -R /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
to rebuild the /dev/sd2e component. As the rebuilding is in progress,
|
|
the status will be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: reconstructing
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
and when completed, will be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
In circumstances where a particular component is completely
|
|
unavailable after a reboot, a special component name will be used to
|
|
indicate the missing component. For example:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd2e: optimal
|
|
component1: failed
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
indicates that the second component of this RAID set was not detected
|
|
at all by the auto-configuration code. The name
|
|
.Sq component1
|
|
can be used anywhere a normal component name would be used. For
|
|
example, to add a hot spare to the above set, and rebuild to that hot
|
|
spare, the following could be done:
|
|
.Bd -literal -offset indent
|
|
raidctl -a /dev/sd3e raid0
|
|
raidctl -F component1 raid0
|
|
.Ed
|
|
.Pp
|
|
at which point the data missing from
|
|
.Sq component1
|
|
would be reconstructed onto /dev/sd3e.
|
|
.Pp
|
|
When more than one component is marked as
|
|
.Sq failed
|
|
due to a non-component hardware failure (e.g. loss of power to two
|
|
components, adapter problems, termination problems, or cabling issues) it
|
|
is quite possible to recover the data on the RAID set. The first
|
|
thing to be aware of is that the first disk to fail will almost certainly
|
|
be out-of-sync with the remainder of the array. If any IO was
|
|
performed between the time the first component is considered
|
|
.Sq failed
|
|
and when the second component is considered
|
|
.Sq failed ,
|
|
then the first component to fail will
|
|
.Ar not
|
|
contain correct data, and should be ignored. When the second
|
|
component is marked as failed, however, the RAID device will
|
|
(currently) panic the system. At this point the data on the RAID set
|
|
(not including the first failed component) is still self consistent,
|
|
and will be in no worse state of repair than had the power gone out in
|
|
the middle of a write to a filesystem on a non-RAID device.
|
|
The problem, however, is that the component labels may now have 3
|
|
different 'modification counters' (one value on the first component
|
|
that failed, one value on the second component that failed, and a
|
|
third value on the remaining components). In such a situation, the
|
|
RAID set will not autoconfigure, and can only be forcibly re-configured
|
|
with the
|
|
.Fl C
|
|
option. To recover the RAID set, one must first remedy whatever physical
|
|
problem caused the multiple-component failure. After that is done,
|
|
the RAID set can be restored by forcibly configuring the raid set
|
|
.Ar without
|
|
the component that failed first. For example, if /dev/sd1e and
|
|
/dev/sd2e fail (in that order) in a RAID set of the following
|
|
configuration:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 4 0
|
|
|
|
START drives
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
64 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
|
|
.Ed
|
|
.Pp
|
|
then the following configuration (say "recover_raid0.conf")
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 4 0
|
|
|
|
START drives
|
|
/dev/sd6e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
64 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
(where /dev/sd6e has no physical device) can be used with
|
|
.Bd -literal -offset indent
|
|
raidctl -C recover_raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
to force the configuration of raid0. A
|
|
.Bd -literal -offset indent
|
|
raidctl -I 12345 raid0
|
|
.Ed
|
|
.Pp
|
|
will be required in order to synchronize the component labels.
|
|
At this point the filesystems on the RAID set can then be checked and
|
|
corrected. To complete the re-construction of the RAID set,
|
|
/dev/sd1e is simply hot-added back into the array, and reconstructed
|
|
as described earlier.
|
|
.Ss RAID on RAID
|
|
RAID sets can be layered to create more complex and much larger RAID
|
|
sets. A RAID 0 set, for example, could be constructed from four RAID
|
|
5 sets. The following configuration file shows such a setup:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 4 0
|
|
|
|
START disks
|
|
/dev/raid1e
|
|
/dev/raid2e
|
|
/dev/raid3e
|
|
/dev/raid4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
|
|
128 1 1 0
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
A similar configuration file might be used for a RAID 0 set
|
|
constructed from components on RAID 1 sets. In such a configuration,
|
|
the mirroring provides a high degree of redundancy, while the striping
|
|
provides additional speed benefits.
|
|
.Ss Auto-configuration and Root on RAID
|
|
RAID sets can also be auto-configured at boot. To make a set
|
|
auto-configurable, simply prepare the RAID set as above, and then do
|
|
a:
|
|
.Bd -literal -offset indent
|
|
raidctl -A yes raid0
|
|
.Ed
|
|
.Pp
|
|
to turn on auto-configuration for that set. To turn off
|
|
auto-configuration, use:
|
|
.Bd -literal -offset indent
|
|
raidctl -A no raid0
|
|
.Ed
|
|
.Pp
|
|
RAID sets which are auto-configurable will be configured before the
|
|
root file system is mounted. These RAID sets are thus available for
|
|
use as a root file system, or for any other file system. A primary
|
|
advantage of using the auto-configuration is that RAID components
|
|
become more independent of the disks they reside on. For example,
|
|
SCSI ID's can change, but auto-configured sets will always be
|
|
configured correctly, even if the SCSI ID's of the component disks
|
|
have become scrambled.
|
|
.Pp
|
|
Having a system's root file system
|
|
.Pq Pa /
|
|
on a RAID set is also allowed,
|
|
with the
|
|
.Sq a
|
|
partition of such a RAID set being used for
|
|
.Pa / .
|
|
To use raid0a as the root file system, simply use:
|
|
.Bd -literal -offset indent
|
|
raidctl -A root raid0
|
|
.Ed
|
|
.Pp
|
|
To return raid0a to be just an auto-configuring set simply use the
|
|
.Fl A Ar yes
|
|
arguments.
|
|
.Pp
|
|
Note that kernels can only be directly read from RAID 1 components on
|
|
alpha and pmax architectures. On those architectures, the
|
|
.Dv FS_RAID
|
|
file system is recognized by the bootblocks, and will properly load the
|
|
kernel directly from a RAID 1 component. For other architectures, or
|
|
to support the root file system on other RAID sets, some other
|
|
mechanism must be used to get a kernel booting. For example, a small
|
|
partition containing only the secondary boot-blocks and an alternate
|
|
kernel (or two) could be used. Once a kernel is booting however, and
|
|
an auto-configuring RAID set is found that is eligible to be root,
|
|
then that RAID set will be auto-configured and used as the root
|
|
device. If two or more RAID sets claim to be root devices, then the
|
|
user will be prompted to select the root device. At this time, RAID
|
|
0, 1, 4, and 5 sets are all supported as root devices.
|
|
.Pp
|
|
A typical RAID 1 setup with root on RAID might be as follows:
|
|
.Bl -enum
|
|
.It
|
|
wd0a - a small partition, which contains a complete, bootable, basic
|
|
.Nx
|
|
installation.
|
|
.It
|
|
wd1a - also contains a complete, bootable, basic
|
|
.Nx
|
|
installation.
|
|
.It
|
|
wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
|
|
.It
|
|
wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
|
|
swap space.
|
|
.It
|
|
wd0g and wd1g - a RAID 1 set, raid2, used for
|
|
.Pa /usr ,
|
|
.Pa /home ,
|
|
or other data, if desired.
|
|
.It
|
|
wd0h and wd0h - a RAID 1 set, raid3, if desired.
|
|
.El
|
|
.Pp
|
|
RAID sets raid0, raid1, and raid2 are all marked as
|
|
auto-configurable. raid0 is marked as being a root file system.
|
|
When new kernels are installed, the kernel is not only copied to
|
|
.Pa / ,
|
|
but also to wd0a and wd1a. The kernel on wd0a is required, since that
|
|
is the kernel the system boots from. The kernel on wd1a is also
|
|
required, since that will be the kernel used should wd0 fail. The
|
|
important point here is to have redundant copies of the kernel
|
|
available, in the event that one of the drives fail.
|
|
.Pp
|
|
There is no requirement that the root file system be on the same disk
|
|
as the kernel. For example, obtaining the kernel from wd0a, and using
|
|
sd0e and sd1e for raid0, and the root file system, is fine. It
|
|
.Ar is
|
|
critical, however, that there be multiple kernels available, in the
|
|
event of media failure.
|
|
.Pp
|
|
Multi-layered RAID devices (such as a RAID 0 set made
|
|
up of RAID 1 sets) are
|
|
.Ar not
|
|
supported as root devices or auto-configurable devices at this point.
|
|
(Multi-layered RAID devices
|
|
.Ar are
|
|
supported in general, however, as mentioned earlier.) Note that in
|
|
order to enable component auto-detection and auto-configuration of
|
|
RAID devices, the line:
|
|
.Bd -literal -offset indent
|
|
options RAID_AUTOCONFIG
|
|
.Ed
|
|
.Pp
|
|
must be in the kernel configuration file. See
|
|
.Xr raid 4
|
|
for more details.
|
|
.Ss Unconfiguration
|
|
The final operation performed by
|
|
.Nm
|
|
is to unconfigure a
|
|
.Xr raid 4
|
|
device. This is accomplished via a simple:
|
|
.Bd -literal -offset indent
|
|
raidctl -u raid0
|
|
.Ed
|
|
.Pp
|
|
at which point the device is ready to be reconfigured.
|
|
.Ss Performance Tuning
|
|
Selection of the various parameter values which result in the best
|
|
performance can be quite tricky, and often requires a bit of
|
|
trial-and-error to get those values most appropriate for a given system.
|
|
A whole range of factors come into play, including:
|
|
.Bl -enum
|
|
.It
|
|
Types of components (e.g. SCSI vs. IDE) and their bandwidth
|
|
.It
|
|
Types of controller cards and their bandwidth
|
|
.It
|
|
Distribution of components among controllers
|
|
.It
|
|
IO bandwidth
|
|
.It
|
|
file system access patterns
|
|
.It
|
|
CPU speed
|
|
.El
|
|
.Pp
|
|
As with most performance tuning, benchmarking under real-life loads
|
|
may be the only way to measure expected performance. Understanding
|
|
some of the underlying technology is also useful in tuning. The goal
|
|
of this section is to provide pointers to those parameters which may
|
|
make significant differences in performance.
|
|
.Pp
|
|
For a RAID 1 set, a SectPerSU value of 64 or 128 is typically
|
|
sufficient. Since data in a RAID 1 set is arranged in a linear
|
|
fashion on each component, selecting an appropriate stripe size is
|
|
somewhat less critical than it is for a RAID 5 set. However: a stripe
|
|
size that is too small will cause large IO's to be broken up into a
|
|
number of smaller ones, hurting performance. At the same time, a
|
|
large stripe size may cause problems with concurrent accesses to
|
|
stripes, which may also affect performance. Thus values in the range
|
|
of 32 to 128 are often the most effective.
|
|
.Pp
|
|
Tuning RAID 5 sets is trickier. In the best case, IO is presented to
|
|
the RAID set one stripe at a time. Since the entire stripe is
|
|
available at the beginning of the IO, the parity of that stripe can
|
|
be calculated before the stripe is written, and then the stripe data
|
|
and parity can be written in parallel. When the amount of data being
|
|
written is less than a full stripe worth, the
|
|
.Sq small write
|
|
problem occurs. Since a
|
|
.Sq small write
|
|
means only a portion of the stripe on the components is going to
|
|
change, the data (and parity) on the components must be updated
|
|
slightly differently. First, the
|
|
.Sq old parity
|
|
and
|
|
.Sq old data
|
|
must be read from the components. Then the new parity is constructed,
|
|
using the new data to be written, and the old data and old parity.
|
|
Finally, the new data and new parity are written. All this extra data
|
|
shuffling results in a serious loss of performance, and is typically 2
|
|
to 4 times slower than a full stripe write (or read). To combat this
|
|
problem in the real world, it may be useful to ensure that stripe
|
|
sizes are small enough that a
|
|
.Sq large IO
|
|
from the system will use exactly one large stripe write. As is seen
|
|
later, there are some file system dependencies which may come into play
|
|
here as well.
|
|
.Pp
|
|
Since the size of a
|
|
.Sq large IO
|
|
is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
|
|
be desirable to select a SectPerSU value of 16 blocks (8K) or 32
|
|
blocks (16K). Since there are 4 data sectors per stripe, the maximum
|
|
data per stripe is 64 blocks (32K) or 128 blocks (64K). Again,
|
|
empirical measurement will provide the best indicators of which
|
|
values will yeild better performance.
|
|
.Pp
|
|
The parameters used for the file system are also critical to good
|
|
performance. For
|
|
.Xr newfs 8 ,
|
|
for example, increasing the block size to 32K or 64K may improve
|
|
performance dramatically. As well, changing the cylinders-per-group
|
|
parameter from 16 to 32 or higher is often not only necessary for
|
|
larger file systems, but may also have positive performance
|
|
implications.
|
|
.Ss Summary
|
|
Despite the length of this man-page, configuring a RAID set is a
|
|
relatively straight-forward process. All that needs to be done is the
|
|
following steps:
|
|
.Bl -enum
|
|
.It
|
|
Use
|
|
.Xr disklabel 8
|
|
to create the components (of type RAID).
|
|
.It
|
|
Construct a RAID configuration file: e.g.
|
|
.Sq raid0.conf
|
|
.It
|
|
Configure the RAID set with:
|
|
.Bd -literal -offset indent
|
|
raidctl -C raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Initialize the component labels with:
|
|
.Bd -literal -offset indent
|
|
raidctl -I 123456 raid0
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Initialize other important parts of the set with:
|
|
.Bd -literal -offset indent
|
|
raidctl -i raid0
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Get the default label for the RAID set:
|
|
.Bd -literal -offset indent
|
|
disklabel raid0 > /tmp/label
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Edit the label:
|
|
.Bd -literal -offset indent
|
|
vi /tmp/label
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Put the new label on the RAID set:
|
|
.Bd -literal -offset indent
|
|
disklabel -R -r raid0 /tmp/label
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Create the file system:
|
|
.Bd -literal -offset indent
|
|
newfs /dev/rraid0e
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Mount the file system:
|
|
.Bd -literal -offset indent
|
|
mount /dev/raid0e /mnt
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Use:
|
|
.Bd -literal -offset indent
|
|
raidctl -c raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
To re-configure the RAID set the next time it is needed, or put
|
|
raid0.conf into /etc where it will automatically be started by
|
|
the /etc/rc scripts.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr ccd 4 ,
|
|
.Xr raid 4 ,
|
|
.Xr rc 8
|
|
.Sh HISTORY
|
|
RAIDframe is a framework for rapid prototyping of RAID structures
|
|
developed by the folks at the Parallel Data Laboratory at Carnegie
|
|
Mellon University (CMU).
|
|
A more complete description of the internals and functionality of
|
|
RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
|
|
for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
|
|
Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
|
|
Parallel Data Laboratory of Carnegie Mellon University.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
command first appeared as a program in CMU's RAIDframe v1.1 distribution. This
|
|
version of
|
|
.Nm
|
|
is a complete re-write, and first appeared in
|
|
.Nx 1.4 .
|
|
.Sh COPYRIGHT
|
|
.Bd -literal
|
|
The RAIDframe Copyright is as follows:
|
|
|
|
Copyright (c) 1994-1996 Carnegie-Mellon University.
|
|
All rights reserved.
|
|
|
|
Permission to use, copy, modify and distribute this software and
|
|
its documentation is hereby granted, provided that both the copyright
|
|
notice and this permission notice appear in all copies of the
|
|
software, derivative works or modified versions, and any portions
|
|
thereof, and that both notices appear in supporting documentation.
|
|
|
|
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
|
|
Carnegie Mellon requests users of this software to return to
|
|
|
|
Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
School of Computer Science
|
|
Carnegie Mellon University
|
|
Pittsburgh PA 15213-3890
|
|
|
|
any improvements or extensions that they make and grant Carnegie the
|
|
rights to redistribute these changes.
|
|
.Ed
|
|
.Sh WARNINGS
|
|
Certain RAID levels (1, 4, 5, 6, and others) can protect against some
|
|
data loss due to component failure. However the loss of two
|
|
components of a RAID 4 or 5 system, or the loss of a single component
|
|
of a RAID 0 system will result in the entire file system being lost.
|
|
RAID is
|
|
.Ar NOT
|
|
a substitute for good backup practices.
|
|
.Pp
|
|
Recomputation of parity
|
|
.Ar MUST
|
|
be performed whenever there is a chance that it may have been
|
|
compromised. This includes after system crashes, or before a RAID
|
|
device has been used for the first time. Failure to keep parity
|
|
correct will be catastrophic should a component ever fail -- it is
|
|
better to use RAID 0 and get the additional space and speed, than it
|
|
is to use parity, but not keep the parity correct. At least with RAID
|
|
0 there is no perception of increased data security.
|
|
.Sh BUGS
|
|
Hot-spare removal is currently not available.
|