1527 lines
43 KiB
Groff
1527 lines
43 KiB
Groff
.\" $NetBSD: raidctl.8,v 1.32 2002/10/01 14:20:26 wiz Exp $
|
|
.\"
|
|
.\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
.\" by Greg Oster
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
.\" must display the following acknowledgement:
|
|
.\" This product includes software developed by the NetBSD
|
|
.\" Foundation, Inc. and its contributors.
|
|
.\" 4. Neither the name of The NetBSD Foundation nor the names of its
|
|
.\" contributors may be used to endorse or promote products derived
|
|
.\" from this software without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.\"
|
|
.\" Copyright (c) 1995 Carnegie-Mellon University.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Author: Mark Holland
|
|
.\"
|
|
.\" Permission to use, copy, modify and distribute this software and
|
|
.\" its documentation is hereby granted, provided that both the copyright
|
|
.\" notice and this permission notice appear in all copies of the
|
|
.\" software, derivative works or modified versions, and any portions
|
|
.\" thereof, and that both notices appear in supporting documentation.
|
|
.\"
|
|
.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
.\"
|
|
.\" Carnegie Mellon requests users of this software to return to
|
|
.\"
|
|
.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
.\" School of Computer Science
|
|
.\" Carnegie Mellon University
|
|
.\" Pittsburgh PA 15213-3890
|
|
.\"
|
|
.\" any improvements or extensions that they make and grant Carnegie the
|
|
.\" rights to redistribute these changes.
|
|
.\"
|
|
.Dd July 10, 2001
|
|
.Dt RAIDCTL 8
|
|
.Os
|
|
.Sh NAME
|
|
.Nm raidctl
|
|
.Nd configuration utility for the RAIDframe disk driver
|
|
.Sh SYNOPSIS
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl a Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl A Op yes | no | root
|
|
.Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl B Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl c Ar config_file Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl C Ar config_file Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl f Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl F Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl g Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl G Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl i Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl I Ar serial_number Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl p Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl P Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl r Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl R Ar component Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl s Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl S Ar dev
|
|
.Nm ""
|
|
.Op Fl v
|
|
.Fl u Ar dev
|
|
.Sh DESCRIPTION
|
|
.Nm
|
|
is the user-land control program for
|
|
.Xr raid 4 ,
|
|
the RAIDframe disk device.
|
|
.Nm
|
|
is primarily used to dynamically configure and unconfigure RAIDframe disk
|
|
devices.
|
|
For more information about the RAIDframe disk device, see
|
|
.Xr raid 4 .
|
|
.Pp
|
|
This document assumes the reader has at least rudimentary knowledge of
|
|
RAID and RAID concepts.
|
|
.Pp
|
|
The command-line options for
|
|
.Nm
|
|
are as follows:
|
|
.Bl -tag -width indent
|
|
.It Fl a Ar component Ar dev
|
|
Add
|
|
.Ar component
|
|
as a hot spare for the device
|
|
.Ar dev .
|
|
.It Fl A Ic yes Ar dev
|
|
Make the RAID set auto-configurable.
|
|
The RAID set will be automatically configured at boot
|
|
.Ar before
|
|
the root file system is mounted.
|
|
Note that all components of the set must be of type
|
|
.Dv RAID
|
|
in the disklabel.
|
|
.It Fl A Ic no Ar dev
|
|
Turn off auto-configuration for the RAID set.
|
|
.It Fl A Ic root Ar dev
|
|
Make the RAID set auto-configurable, and also mark the set as being
|
|
eligible to be the root partition.
|
|
A RAID set configured this way will
|
|
.Ar override
|
|
the use of the boot disk as the root device.
|
|
All components of the set must be of type
|
|
.Dv RAID
|
|
in the disklabel.
|
|
Note that the kernel being booted must currently reside on a non-RAID set.
|
|
.It Fl B Ar dev
|
|
Initiate a copyback of reconstructed data from a spare disk to
|
|
its original disk.
|
|
This is performed after a component has failed,
|
|
and the failed drive has been reconstructed onto a spare drive.
|
|
.It Fl c Ar config_file Ar dev
|
|
Configure the RAIDframe device
|
|
.Ar dev
|
|
according to the configuration given in
|
|
.Ar config_file .
|
|
A description of the contents of
|
|
.Ar config_file
|
|
is given later.
|
|
.It Fl C Ar config_file Ar dev
|
|
As for
|
|
.Fl c ,
|
|
but forces the configuration to take place.
|
|
This is required the first time a RAID set is configured.
|
|
.It Fl f Ar component Ar dev
|
|
This marks the specified
|
|
.Ar component
|
|
as having failed, but does not initiate a reconstruction of that component.
|
|
.It Fl F Ar component Ar dev
|
|
Fails the specified
|
|
.Ar component
|
|
of the device, and immediately begin a reconstruction of the failed
|
|
disk onto an available hot spare.
|
|
This is one of the mechanisms used to start
|
|
the reconstruction process if a component does have a hardware failure.
|
|
.It Fl g Ar component Ar dev
|
|
Get the component label for the specified component.
|
|
.It Fl G Ar dev
|
|
Generate the configuration of the RAIDframe device in a format suitable for
|
|
use with the
|
|
.Fl c
|
|
or
|
|
.Fl C
|
|
options.
|
|
.It Fl i Ar dev
|
|
Initialize the RAID device.
|
|
In particular, (re-)write the parity on the selected device.
|
|
This
|
|
.Em MUST
|
|
be done for
|
|
.Em all
|
|
RAID sets before the RAID device is labeled and before
|
|
file systems are created on the RAID device.
|
|
.It Fl I Ar serial_number Ar dev
|
|
Initialize the component labels on each component of the device.
|
|
.Ar serial_number
|
|
is used as one of the keys in determining whether a
|
|
particular set of components belong to the same RAID set.
|
|
While not strictly enforced, different serial numbers should be used for
|
|
different RAID sets.
|
|
This step
|
|
.Em MUST
|
|
be performed when a new RAID set is created.
|
|
.It Fl p Ar dev
|
|
Check the status of the parity on the RAID set.
|
|
Displays a status message,
|
|
and returns successfully if the parity is up-to-date.
|
|
.It Fl P Ar dev
|
|
Check the status of the parity on the RAID set, and initialize
|
|
(re-write) the parity if the parity is not known to be up-to-date.
|
|
This is normally used after a system crash (and before a
|
|
.Xr fsck 8 )
|
|
to ensure the integrity of the parity.
|
|
.It Fl r Ar component Ar dev
|
|
Remove the spare disk specified by
|
|
.Ar component
|
|
from the set of available spare components.
|
|
.It Fl R Ar component Ar dev
|
|
Fails the specified
|
|
.Ar component ,
|
|
if necessary, and immediately begins a reconstruction back to
|
|
.Ar component .
|
|
This is useful for reconstructing back onto a component after
|
|
it has been replaced following a failure.
|
|
.It Fl s Ar dev
|
|
Display the status of the RAIDframe device for each of the components
|
|
and spares.
|
|
.It Fl S Ar dev
|
|
Check the status of parity re-writing, component reconstruction, and
|
|
component copyback.
|
|
The output indicates the amount of progress
|
|
achieved in each of these areas.
|
|
.It Fl u Ar dev
|
|
Unconfigure the RAIDframe device.
|
|
.It Fl v
|
|
Be more verbose.
|
|
For operations such as reconstructions, parity
|
|
re-writing, and copybacks, provide a progress indicator.
|
|
.El
|
|
.Pp
|
|
The device used by
|
|
.Nm
|
|
is specified by
|
|
.Ar dev .
|
|
.Ar dev
|
|
may be either the full name of the device, e.g.,
|
|
.Pa /dev/rraid0d ,
|
|
for the i386 architecture, or
|
|
.Pa /dev/rraid0c
|
|
for many others, or just simply
|
|
.Pa raid0
|
|
(for
|
|
.Pa /dev/rraid0[cd] ) .
|
|
.Ss Configuration file
|
|
The format of the configuration file is complex, and
|
|
only an abbreviated treatment is given here.
|
|
In the configuration files, a
|
|
.Sq #
|
|
indicates the beginning of a comment.
|
|
.Pp
|
|
There are 4 required sections of a configuration file, and 2
|
|
optional sections.
|
|
Each section begins with a
|
|
.Sq START ,
|
|
followed by the section name,
|
|
and the configuration parameters associated with that section.
|
|
The first section is the
|
|
.Sq array
|
|
section, and it specifies
|
|
the number of rows, columns, and spare disks in the RAID set.
|
|
For example:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 3 0
|
|
.Ed
|
|
.Pp
|
|
indicates an array with 1 row, 3 columns, and 0 spare disks.
|
|
Note that although multi-dimensional arrays may be specified, they are
|
|
.Em NOT
|
|
supported in the driver.
|
|
.Pp
|
|
The second section, the
|
|
.Sq disks
|
|
section, specifies the actual components of the device.
|
|
For example:
|
|
.Bd -literal -offset indent
|
|
START disks
|
|
/dev/sd0e
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
.Ed
|
|
.Pp
|
|
specifies the three component disks to be used in the RAID device.
|
|
If any of the specified drives cannot be found when the RAID device is
|
|
configured, then they will be marked as
|
|
.Sq failed ,
|
|
and the system will operate in degraded mode.
|
|
Note that it is
|
|
.Em imperative
|
|
that the order of the components in the configuration file does not
|
|
change between configurations of a RAID device.
|
|
Changing the order of the components will result in data loss
|
|
if the set is configured with the
|
|
.Fl C
|
|
option.
|
|
In normal circumstances, the RAID set will not configure if only
|
|
.Fl c
|
|
is specified, and the components are out-of-order.
|
|
.Pp
|
|
The next section, which is the
|
|
.Sq spare
|
|
section, is optional, and, if present, specifies the devices to be used as
|
|
.Sq hot spares
|
|
\(em devices which are on-line,
|
|
but are not actively used by the RAID driver unless
|
|
one of the main components fail.
|
|
A simple
|
|
.Sq spare
|
|
section might be:
|
|
.Bd -literal -offset indent
|
|
START spare
|
|
/dev/sd3e
|
|
.Ed
|
|
.Pp
|
|
for a configuration with a single spare component.
|
|
If no spare drives are to be used in the configuration, then the
|
|
.Sq spare
|
|
section may be omitted.
|
|
.Pp
|
|
The next section is the
|
|
.Sq layout
|
|
section.
|
|
This section describes the general layout parameters for the RAID device,
|
|
and provides such information as
|
|
sectors per stripe unit,
|
|
stripe units per parity unit,
|
|
stripe units per reconstruction unit,
|
|
and the parity configuration to use.
|
|
This section might look like:
|
|
.Bd -literal -offset indent
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
|
|
32 1 1 5
|
|
.Ed
|
|
.Pp
|
|
The sectors per stripe unit specifies, in blocks, the interleave
|
|
factor; i.e., the number of contiguous sectors to be written to each
|
|
component for a single stripe.
|
|
Appropriate selection of this value (32 in this example)
|
|
is the subject of much research in RAID architectures.
|
|
The stripe units per parity unit and
|
|
stripe units per reconstruction unit are normally each set to 1.
|
|
While certain values above 1 are permitted, a discussion of valid
|
|
values and the consequences of using anything other than 1 are outside
|
|
the scope of this document.
|
|
The last value in this section (5 in this example)
|
|
indicates the parity configuration desired.
|
|
Valid entries include:
|
|
.Bl -tag -width inde
|
|
.It 0
|
|
RAID level 0.
|
|
No parity, only simple striping.
|
|
.It 1
|
|
RAID level 1.
|
|
Mirroring.
|
|
The parity is the mirror.
|
|
.It 4
|
|
RAID level 4.
|
|
Striping across components, with parity stored on the last component.
|
|
.It 5
|
|
RAID level 5.
|
|
Striping across components, parity distributed across all components.
|
|
.El
|
|
.Pp
|
|
There are other valid entries here, including those for Even-Odd
|
|
parity, RAID level 5 with rotated sparing, Chained declustering,
|
|
and Interleaved declustering, but as of this writing the code for
|
|
those parity operations has not been tested with
|
|
.Nx .
|
|
.Pp
|
|
The next required section is the
|
|
.Sq queue
|
|
section.
|
|
This is most often specified as:
|
|
.Bd -literal -offset indent
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
where the queuing method is specified as fifo (first-in, first-out),
|
|
and the size of the per-component queue is limited to 100 requests.
|
|
Other queuing methods may also be specified, but a discussion of them
|
|
is beyond the scope of this document.
|
|
.Pp
|
|
The final section, the
|
|
.Sq debug
|
|
section, is optional.
|
|
For more details on this the reader is referred to
|
|
the RAIDframe documentation discussed in the
|
|
.Sx HISTORY
|
|
section.
|
|
.Pp
|
|
See
|
|
.Sx EXAMPLES
|
|
for a more complete configuration file example.
|
|
.Sh FILES
|
|
.Bl -tag -width /dev/XXrXraidX -compact
|
|
.It Pa /dev/{,r}raid*
|
|
.Cm raid
|
|
device special files.
|
|
.El
|
|
.Sh EXAMPLES
|
|
It is highly recommended that before using the RAID driver for real
|
|
file systems that the system administrator(s) become quite familiar
|
|
with the use of
|
|
.Nm "" ,
|
|
and that they understand how the component reconstruction process works.
|
|
The examples in this section will focus on configuring a
|
|
number of different RAID sets of varying degrees of redundancy.
|
|
By working through these examples, administrators should be able to
|
|
develop a good feel for how to configure a RAID set, and how to
|
|
initiate reconstruction of failed components.
|
|
.Pp
|
|
In the following examples
|
|
.Sq raid0
|
|
will be used to denote the RAID device.
|
|
Depending on the architecture,
|
|
.Pa /dev/rraid0c
|
|
or
|
|
.Pa /dev/rraid0d
|
|
may be used in place of
|
|
.Pa raid0 .
|
|
.Ss Initialization and Configuration
|
|
The initial step in configuring a RAID set is to identify the components
|
|
that will be used in the RAID set.
|
|
All components should be the same size.
|
|
Each component should have a disklabel type of
|
|
.Dv FS_RAID ,
|
|
and a typical disklabel entry for a RAID component might look like:
|
|
.Bd -literal -offset indent
|
|
f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
|
|
.Ed
|
|
.Pp
|
|
While
|
|
.Dv FS_BSDFFS
|
|
will also work as the component type, the type
|
|
.Dv FS_RAID
|
|
is preferred for RAIDframe use, as it is required for features such as
|
|
auto-configuration.
|
|
As part of the initial configuration of each RAID set,
|
|
each component will be given a
|
|
.Sq component label .
|
|
A
|
|
.Sq component label
|
|
contains important information about the component, including a
|
|
user-specified serial number, the row and column of that component in
|
|
the RAID set, the redundancy level of the RAID set, a
|
|
.Sq modification counter ,
|
|
and whether the parity information (if any) on that
|
|
component is known to be correct.
|
|
Component labels are an integral part of the RAID set,
|
|
since they are used to ensure that components
|
|
are configured in the correct order, and used to keep track of other
|
|
vital information about the RAID set.
|
|
Component labels are also required for the auto-detection
|
|
and auto-configuration of RAID sets at boot time.
|
|
For a component label to be considered valid, that
|
|
particular component label must be in agreement with the other
|
|
component labels in the set.
|
|
For example, the serial number,
|
|
.Sq modification counter ,
|
|
number of rows and number of columns must all be in agreement.
|
|
If any of these are different, then the component is
|
|
not considered to be part of the set.
|
|
See
|
|
.Xr raid 4
|
|
for more information about component labels.
|
|
.Pp
|
|
Once the components have been identified, and the disks have
|
|
appropriate labels,
|
|
.Nm
|
|
is then used to configure the
|
|
.Xr raid 4
|
|
device.
|
|
To configure the device, a configuration file which looks something like:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 3 1
|
|
|
|
START disks
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
|
|
START spare
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
32 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
is created in a file.
|
|
The above configuration file specifies a RAID 5
|
|
set consisting of the components
|
|
.Pa /dev/sd1e ,
|
|
.Pa /dev/sd2e ,
|
|
and
|
|
.Pa /dev/sd3e ,
|
|
with
|
|
.Pa /dev/sd4e
|
|
available as a
|
|
.Sq hot spare
|
|
in case one of the three main drives should fail.
|
|
A RAID 0 set would be specified in a similar way:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 4 0
|
|
|
|
START disks
|
|
/dev/sd10e
|
|
/dev/sd11e
|
|
/dev/sd12e
|
|
/dev/sd13e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
|
|
64 1 1 0
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
In this case, devices
|
|
.Pa /dev/sd10e ,
|
|
.Pa /dev/sd11e ,
|
|
.Pa /dev/sd12e ,
|
|
and
|
|
.Pa /dev/sd13e
|
|
are the components that make up this RAID set.
|
|
Note that there are no hot spares for a RAID 0 set,
|
|
since there is no way to recover data if any of the components fail.
|
|
.Pp
|
|
For a RAID 1 (mirror) set, the following configuration might be used:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 2 0
|
|
|
|
START disks
|
|
/dev/sd20e
|
|
/dev/sd21e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
|
|
128 1 1 1
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
In this case,
|
|
.Pa /dev/sd20e
|
|
and
|
|
.Pa /dev/sd21e
|
|
are the two components of the mirror set.
|
|
While no hot spares have been specified in this
|
|
configuration, they easily could be, just as they were specified in
|
|
the RAID 5 case above.
|
|
Note as well that RAID 1 sets are currently limited to only 2 components.
|
|
At present, n-way mirroring is not possible.
|
|
.Pp
|
|
The first time a RAID set is configured, the
|
|
.Fl C
|
|
option must be used:
|
|
.Bd -literal -offset indent
|
|
raidctl -C raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
where
|
|
.Pa raid0.conf
|
|
is the name of the RAID configuration file.
|
|
The
|
|
.Fl C
|
|
forces the configuration to succeed, even if any of the component
|
|
labels are incorrect.
|
|
The
|
|
.Fl C
|
|
option should not be used lightly in
|
|
situations other than initial configurations, as if
|
|
the system is refusing to configure a RAID set, there is probably a
|
|
very good reason for it.
|
|
After the initial configuration is done (and
|
|
appropriate component labels are added with the
|
|
.Fl I
|
|
option) then raid0 can be configured normally with:
|
|
.Bd -literal -offset indent
|
|
raidctl -c raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
When the RAID set is configured for the first time, it is
|
|
necessary to initialize the component labels, and to initialize the
|
|
parity on the RAID set.
|
|
Initializing the component labels is done with:
|
|
.Bd -literal -offset indent
|
|
raidctl -I 112341 raid0
|
|
.Ed
|
|
.Pp
|
|
where
|
|
.Sq 112341
|
|
is a user-specified serial number for the RAID set.
|
|
This initialization step is
|
|
.Em required
|
|
for all RAID sets.
|
|
As well, using different serial numbers between RAID sets is
|
|
.Em strongly encouraged ,
|
|
as using the same serial number for all RAID sets will only serve to
|
|
decrease the usefulness of the component label checking.
|
|
.Pp
|
|
Initializing the RAID set is done via the
|
|
.Fl i
|
|
option.
|
|
This initialization
|
|
.Em MUST
|
|
be done for
|
|
.Em all
|
|
RAID sets, since among other things it verifies that the parity (if
|
|
any) on the RAID set is correct.
|
|
Since this initialization may be quite time-consuming, the
|
|
.Fl v
|
|
option may be also used in conjunction with
|
|
.Fl i :
|
|
.Bd -literal -offset indent
|
|
raidctl -iv raid0
|
|
.Ed
|
|
.Pp
|
|
This will give more verbose output on the
|
|
status of the initialization:
|
|
.Bd -literal -offset indent
|
|
Initiating re-write of parity
|
|
Parity Re-write status:
|
|
10% |**** | ETA: 06:03 /
|
|
.Ed
|
|
.Pp
|
|
The output provides a
|
|
.Sq Percent Complete
|
|
in both a numeric and graphical format, as well as an estimated time
|
|
to completion of the operation.
|
|
.Pp
|
|
Since it is the parity that provides the
|
|
.Sq redundancy
|
|
part of RAID, it is critical that the parity is correct as much as possible.
|
|
If the parity is not correct, then there is no
|
|
guarantee that data will not be lost if a component fails.
|
|
.Pp
|
|
Once the parity is known to be correct, it is then safe to perform
|
|
.Xr disklabel 8 ,
|
|
.Xr newfs 8 ,
|
|
or
|
|
.Xr fsck 8
|
|
on the device or its file systems, and then to mount the file systems
|
|
for use.
|
|
.Pp
|
|
Under certain circumstances (e.g., the additional component has not
|
|
arrived, or data is being migrated off of a disk destined to become a
|
|
component) it may be desirable to configure a RAID 1 set with only
|
|
a single component.
|
|
This can be achieved by configuring the set with a physically existing
|
|
component (as either the first or second component) and with a
|
|
.Sq fake
|
|
component.
|
|
In the following:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 2 0
|
|
|
|
START disks
|
|
/dev/sd6e
|
|
/dev/sd0e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
|
|
128 1 1 1
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
.Pa /dev/sd0e
|
|
is the real component, and will be the second disk of a RAID 1 set.
|
|
The component
|
|
.Pa /dev/sd6e ,
|
|
which must exist, but have no physical device associated with it,
|
|
is simply used as a placeholder.
|
|
Configuration (using
|
|
.Fl C
|
|
and
|
|
.Fl I Ar 12345
|
|
as above) proceeds normally, but initialization of the RAID set will
|
|
have to wait until all physical components are present.
|
|
After configuration, this set can be used normally, but will be operating
|
|
in degraded mode.
|
|
Once a second physical component is obtained, it can be hot-added,
|
|
the existing data mirrored, and normal operation resumed.
|
|
.Ss Maintenance of the RAID set
|
|
After the parity has been initialized for the first time, the command:
|
|
.Bd -literal -offset indent
|
|
raidctl -p raid0
|
|
.Ed
|
|
.Pp
|
|
can be used to check the current status of the parity.
|
|
To check the parity and rebuild it necessary (for example,
|
|
after an unclean shutdown) the command:
|
|
.Bd -literal -offset indent
|
|
raidctl -P raid0
|
|
.Ed
|
|
.Pp
|
|
is used.
|
|
Note that re-writing the parity can be done while
|
|
other operations on the RAID set are taking place (e.g., while doing a
|
|
.Xr fsck 8
|
|
on a file system on the RAID set).
|
|
However: for maximum effectiveness of the RAID set, the parity should be
|
|
known to be correct before any data on the set is modified.
|
|
.Pp
|
|
To see how the RAID set is doing, the following command can be used to
|
|
show the RAID set's status:
|
|
.Bd -literal -offset indent
|
|
raidctl -s raid0
|
|
.Ed
|
|
.Pp
|
|
The output will look something like:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
Component label for /dev/sd1e:
|
|
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Component label for /dev/sd2e:
|
|
Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Component label for /dev/sd3e:
|
|
Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Parity status: clean
|
|
Reconstruction is 100% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
This indicates that all is well with the RAID set.
|
|
Of importance here are the component lines which read
|
|
.Sq optimal ,
|
|
and the
|
|
.Sq Parity status
|
|
line which indicates that the parity is up-to-date.
|
|
Note that if there are file systems open on the RAID set,
|
|
the individual components will not be
|
|
.Sq clean
|
|
but the set as a whole can still be clean.
|
|
.Pp
|
|
To check the component label of
|
|
.Pa /dev/sd1e ,
|
|
the following is used:
|
|
.Bd -literal -offset indent
|
|
raidctl -g /dev/sd1e raid0
|
|
.Ed
|
|
.Pp
|
|
The output of this command will look something like:
|
|
.Bd -literal -offset indent
|
|
Component label for /dev/sd1e:
|
|
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
.Ed
|
|
.Ss Dealing with Component Failures
|
|
If for some reason
|
|
(perhaps to test reconstruction) it is necessary to pretend a drive
|
|
has failed, the following will perform that function:
|
|
.Bd -literal -offset indent
|
|
raidctl -f /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
The system will then be performing all operations in degraded mode,
|
|
where missing data is re-computed from existing data and the parity.
|
|
In this case, obtaining the status of raid0 will return (in part):
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
Note that with the use of
|
|
.Fl f
|
|
a reconstruction has not been started.
|
|
To both fail the disk and start a reconstruction, the
|
|
.Fl F
|
|
option must be used:
|
|
.Bd -literal -offset indent
|
|
raidctl -F /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Fl f
|
|
option may be used first, and then the
|
|
.Fl F
|
|
option used later, on the same disk, if desired.
|
|
Immediately after the reconstruction is started, the status will report:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: reconstructing
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: used_spare
|
|
[...]
|
|
Parity status: clean
|
|
Reconstruction is 10% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
This indicates that a reconstruction is in progress.
|
|
To find out how the reconstruction is progressing the
|
|
.Fl S
|
|
option may be used.
|
|
This will indicate the progress in terms of the
|
|
percentage of the reconstruction that is completed.
|
|
When the reconstruction is finished the
|
|
.Fl s
|
|
option will show:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: spared
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: used_spare
|
|
[...]
|
|
Parity status: clean
|
|
Reconstruction is 100% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
At this point there are at least two options.
|
|
First, if
|
|
.Pa /dev/sd2e
|
|
is known to be good (i.e., the failure was either caused by
|
|
.Fl f
|
|
or
|
|
.Fl F ,
|
|
or the failed disk was replaced), then a copyback of the data can
|
|
be initiated with the
|
|
.Fl B
|
|
option.
|
|
In this example, this would copy the entire contents of
|
|
.Pa /dev/sd4e
|
|
to
|
|
.Pa /dev/sd2e .
|
|
Once the copyback procedure is complete, the
|
|
status of the device would be (in part):
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
and the system is back to normal operation.
|
|
.Pp
|
|
The second option after the reconstruction is to simply use
|
|
.Pa /dev/sd4e
|
|
in place of
|
|
.Pa /dev/sd2e
|
|
in the configuration file.
|
|
For example, the configuration file (in part) might now look like:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 3 0
|
|
|
|
START drives
|
|
/dev/sd1e
|
|
/dev/sd4e
|
|
/dev/sd3e
|
|
.Ed
|
|
.Pp
|
|
This can be done as
|
|
.Pa /dev/sd4e
|
|
is completely interchangeable with
|
|
.Pa /dev/sd2e
|
|
at this point.
|
|
Note that extreme care must be taken when
|
|
changing the order of the drives in a configuration.
|
|
This is one of the few instances where the devices and/or
|
|
their orderings can be changed without loss of data!
|
|
In general, the ordering of components in a configuration file should
|
|
.Em never
|
|
be changed.
|
|
.Pp
|
|
If a component fails and there are no hot spares
|
|
available on-line, the status of the RAID set might (in part) look like:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
In this case there are a number of options.
|
|
The first option is to add a hot spare using:
|
|
.Bd -literal -offset indent
|
|
raidctl -a /dev/sd4e raid0
|
|
.Ed
|
|
.Pp
|
|
After the hot add, the status would then be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
Reconstruction could then take place using
|
|
.Fl F
|
|
as describe above.
|
|
.Pp
|
|
A second option is to rebuild directly onto
|
|
.Pa /dev/sd2e .
|
|
Once the disk containing
|
|
.Pa /dev/sd2e
|
|
has been replaced, one can simply use:
|
|
.Bd -literal -offset indent
|
|
raidctl -R /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
to rebuild the
|
|
.Pa /dev/sd2e
|
|
component.
|
|
As the rebuilding is in progress, the status will be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: reconstructing
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
and when completed, will be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
In circumstances where a particular component is completely
|
|
unavailable after a reboot, a special component name will be used to
|
|
indicate the missing component.
|
|
For example:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd2e: optimal
|
|
component1: failed
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
indicates that the second component of this RAID set was not detected
|
|
at all by the auto-configuration code.
|
|
The name
|
|
.Sq component1
|
|
can be used anywhere a normal component name would be used.
|
|
For example, to add a hot spare to the above set, and rebuild to that hot
|
|
spare, the following could be done:
|
|
.Bd -literal -offset indent
|
|
raidctl -a /dev/sd3e raid0
|
|
raidctl -F component1 raid0
|
|
.Ed
|
|
.Pp
|
|
at which point the data missing from
|
|
.Sq component1
|
|
would be reconstructed onto
|
|
.Pa /dev/sd3e .
|
|
.Pp
|
|
When more than one component is marked as
|
|
.Sq failed
|
|
due to a non-component hardware failure (e.g., loss of power to two
|
|
components, adapter problems, termination problems, or cabling issues) it
|
|
is quite possible to recover the data on the RAID set.
|
|
The first thing to be aware of is that the first disk to fail will
|
|
almost certainly be out-of-sync with the remainder of the array.
|
|
If any IO was performed between the time the first component is considered
|
|
.Sq failed
|
|
and when the second component is considered
|
|
.Sq failed ,
|
|
then the first component to fail will
|
|
.Em not
|
|
contain correct data, and should be ignored.
|
|
When the second component is marked as failed, however, the RAID device will
|
|
(currently) panic the system.
|
|
At this point the data on the RAID set
|
|
(not including the first failed component) is still self consistent,
|
|
and will be in no worse state of repair than had the power gone out in
|
|
the middle of a write to a filesystem on a non-RAID device.
|
|
The problem, however, is that the component labels may now have 3 different
|
|
.Sq modification counters
|
|
(one value on the first component that failed, one value on the second
|
|
component that failed, and a third value on the remaining components).
|
|
In such a situation, the RAID set will not autoconfigure,
|
|
and can only be forcibly re-configured
|
|
with the
|
|
.Fl C
|
|
option.
|
|
To recover the RAID set, one must first remedy whatever physical
|
|
problem caused the multiple-component failure.
|
|
After that is done, the RAID set can be restored by forcibly
|
|
configuring the raid set
|
|
.Em without
|
|
the component that failed first.
|
|
For example, if
|
|
.Pa /dev/sd1e
|
|
and
|
|
.Pa /dev/sd2e
|
|
fail (in that order) in a RAID set of the following configuration:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 4 0
|
|
|
|
START drives
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
64 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
|
|
.Ed
|
|
.Pp
|
|
then the following configuration (say "recover_raid0.conf")
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 4 0
|
|
|
|
START drives
|
|
/dev/sd6e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
64 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
(where
|
|
.Pa /dev/sd6e
|
|
has no physical device) can be used with
|
|
.Bd -literal -offset indent
|
|
raidctl -C recover_raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
to force the configuration of raid0.
|
|
A
|
|
.Bd -literal -offset indent
|
|
raidctl -I 12345 raid0
|
|
.Ed
|
|
.Pp
|
|
will be required in order to synchronize the component labels.
|
|
At this point the filesystems on the RAID set can then be checked and
|
|
corrected.
|
|
To complete the re-construction of the RAID set,
|
|
.Pa /dev/sd1e
|
|
is simply hot-added back into the array, and reconstructed
|
|
as described earlier.
|
|
.Ss RAID on RAID
|
|
RAID sets can be layered to create more complex and much larger RAID sets.
|
|
A RAID 0 set, for example, could be constructed from four RAID 5 sets.
|
|
The following configuration file shows such a setup:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numRow numCol numSpare
|
|
1 4 0
|
|
|
|
START disks
|
|
/dev/raid1e
|
|
/dev/raid2e
|
|
/dev/raid3e
|
|
/dev/raid4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
|
|
128 1 1 0
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
A similar configuration file might be used for a RAID 0 set
|
|
constructed from components on RAID 1 sets.
|
|
In such a configuration, the mirroring provides a high degree
|
|
of redundancy, while the striping provides additional speed benefits.
|
|
.Ss Auto-configuration and Root on RAID
|
|
RAID sets can also be auto-configured at boot.
|
|
To make a set auto-configurable,
|
|
simply prepare the RAID set as above, and then do a:
|
|
.Bd -literal -offset indent
|
|
raidctl -A yes raid0
|
|
.Ed
|
|
.Pp
|
|
to turn on auto-configuration for that set.
|
|
To turn off auto-configuration, use:
|
|
.Bd -literal -offset indent
|
|
raidctl -A no raid0
|
|
.Ed
|
|
.Pp
|
|
RAID sets which are auto-configurable will be configured before the
|
|
root file system is mounted.
|
|
These RAID sets are thus available for
|
|
use as a root file system, or for any other file system.
|
|
A primary advantage of using the auto-configuration is that RAID components
|
|
become more independent of the disks they reside on.
|
|
For example, SCSI ID's can change, but auto-configured sets will always be
|
|
configured correctly, even if the SCSI ID's of the component disks
|
|
have become scrambled.
|
|
.Pp
|
|
Having a system's root file system
|
|
.Pq Pa /
|
|
on a RAID set is also allowed, with the
|
|
.Sq a
|
|
partition of such a RAID set being used for
|
|
.Pa / .
|
|
To use raid0a as the root file system, simply use:
|
|
.Bd -literal -offset indent
|
|
raidctl -A root raid0
|
|
.Ed
|
|
.Pp
|
|
To return raid0a to be just an auto-configuring set simply use the
|
|
.Fl A Ar yes
|
|
arguments.
|
|
.Pp
|
|
Note that kernels can only be directly read from RAID 1 components on
|
|
alpha and pmax architectures.
|
|
On those architectures, the
|
|
.Dv FS_RAID
|
|
file system is recognized by the bootblocks, and will properly load the
|
|
kernel directly from a RAID 1 component.
|
|
For other architectures, or to support the root file system
|
|
on other RAID sets, some other mechanism must be used to get a kernel booting.
|
|
For example, a small partition containing only the secondary boot-blocks
|
|
and an alternate kernel (or two) could be used.
|
|
Once a kernel is booting however, and an auto-configuring RAID set is
|
|
found that is eligible to be root, then that RAID set will be
|
|
auto-configured and used as the root device.
|
|
If two or more RAID sets claim to be root devices, then the
|
|
user will be prompted to select the root device.
|
|
At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices.
|
|
.Pp
|
|
A typical RAID 1 setup with root on RAID might be as follows:
|
|
.Bl -enum
|
|
.It
|
|
wd0a - a small partition, which contains a complete, bootable, basic
|
|
.Nx
|
|
installation.
|
|
.It
|
|
wd1a - also contains a complete, bootable, basic
|
|
.Nx
|
|
installation.
|
|
.It
|
|
wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
|
|
.It
|
|
wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
|
|
swap space.
|
|
.It
|
|
wd0g and wd1g - a RAID 1 set, raid2, used for
|
|
.Pa /usr ,
|
|
.Pa /home ,
|
|
or other data, if desired.
|
|
.It
|
|
wd0h and wd0h - a RAID 1 set, raid3, if desired.
|
|
.El
|
|
.Pp
|
|
RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
|
|
raid0 is marked as being a root file system.
|
|
When new kernels are installed, the kernel is not only copied to
|
|
.Pa / ,
|
|
but also to wd0a and wd1a.
|
|
The kernel on wd0a is required, since that
|
|
is the kernel the system boots from.
|
|
The kernel on wd1a is also
|
|
required, since that will be the kernel used should wd0 fail.
|
|
The important point here is to have redundant copies of the kernel
|
|
available, in the event that one of the drives fail.
|
|
.Pp
|
|
There is no requirement that the root file system be on the same disk
|
|
as the kernel.
|
|
For example, obtaining the kernel from wd0a, and using
|
|
sd0e and sd1e for raid0, and the root file system, is fine.
|
|
It
|
|
.Em is
|
|
critical, however, that there be multiple kernels available, in the
|
|
event of media failure.
|
|
.Pp
|
|
Multi-layered RAID devices (such as a RAID 0 set made
|
|
up of RAID 1 sets) are
|
|
.Em not
|
|
supported as root devices or auto-configurable devices at this point.
|
|
(Multi-layered RAID devices
|
|
.Em are
|
|
supported in general, however, as mentioned earlier.)
|
|
Note that in order to enable component auto-detection and
|
|
auto-configuration of RAID devices, the line:
|
|
.Bd -literal -offset indent
|
|
options RAID_AUTOCONFIG
|
|
.Ed
|
|
.Pp
|
|
must be in the kernel configuration file.
|
|
See
|
|
.Xr raid 4
|
|
for more details.
|
|
.Ss Unconfiguration
|
|
The final operation performed by
|
|
.Nm
|
|
is to unconfigure a
|
|
.Xr raid 4
|
|
device.
|
|
This is accomplished via a simple:
|
|
.Bd -literal -offset indent
|
|
raidctl -u raid0
|
|
.Ed
|
|
.Pp
|
|
at which point the device is ready to be reconfigured.
|
|
.Ss Performance Tuning
|
|
Selection of the various parameter values which result in the best
|
|
performance can be quite tricky, and often requires a bit of
|
|
trial-and-error to get those values most appropriate for a given system.
|
|
A whole range of factors come into play, including:
|
|
.Bl -enum
|
|
.It
|
|
Types of components (e.g., SCSI vs. IDE) and their bandwidth
|
|
.It
|
|
Types of controller cards and their bandwidth
|
|
.It
|
|
Distribution of components among controllers
|
|
.It
|
|
IO bandwidth
|
|
.It
|
|
file system access patterns
|
|
.It
|
|
CPU speed
|
|
.El
|
|
.Pp
|
|
As with most performance tuning, benchmarking under real-life loads
|
|
may be the only way to measure expected performance.
|
|
Understanding some of the underlying technology is also useful in tuning.
|
|
The goal of this section is to provide pointers to those parameters which may
|
|
make significant differences in performance.
|
|
.Pp
|
|
For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
|
|
Since data in a RAID 1 set is arranged in a linear
|
|
fashion on each component, selecting an appropriate stripe size is
|
|
somewhat less critical than it is for a RAID 5 set.
|
|
However: a stripe size that is too small will cause large IO's to be
|
|
broken up into a number of smaller ones, hurting performance.
|
|
At the same time, a large stripe size may cause problems with
|
|
concurrent accesses to stripes, which may also affect performance.
|
|
Thus values in the range of 32 to 128 are often the most effective.
|
|
.Pp
|
|
Tuning RAID 5 sets is trickier.
|
|
In the best case, IO is presented to the RAID set one stripe at a time.
|
|
Since the entire stripe is available at the beginning of the IO,
|
|
the parity of that stripe can be calculated before the stripe is written,
|
|
and then the stripe data and parity can be written in parallel.
|
|
When the amount of data being written is less than a full stripe worth, the
|
|
.Sq small write
|
|
problem occurs.
|
|
Since a
|
|
.Sq small write
|
|
means only a portion of the stripe on the components is going to
|
|
change, the data (and parity) on the components must be updated
|
|
slightly differently.
|
|
First, the
|
|
.Sq old parity
|
|
and
|
|
.Sq old data
|
|
must be read from the components.
|
|
Then the new parity is constructed,
|
|
using the new data to be written, and the old data and old parity.
|
|
Finally, the new data and new parity are written.
|
|
All this extra data shuffling results in a serious loss of performance,
|
|
and is typically 2 to 4 times slower than a full stripe write (or read).
|
|
To combat this problem in the real world, it may be useful
|
|
to ensure that stripe sizes are small enough that a
|
|
.Sq large IO
|
|
from the system will use exactly one large stripe write.
|
|
As is seen later, there are some file system dependencies
|
|
which may come into play here as well.
|
|
.Pp
|
|
Since the size of a
|
|
.Sq large IO
|
|
is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
|
|
be desirable to select a SectPerSU value of 16 blocks (8K) or 32
|
|
blocks (16K).
|
|
Since there are 4 data sectors per stripe, the maximum
|
|
data per stripe is 64 blocks (32K) or 128 blocks (64K).
|
|
Again, empirical measurement will provide the best indicators of which
|
|
values will yeild better performance.
|
|
.Pp
|
|
The parameters used for the file system are also critical to good performance.
|
|
For
|
|
.Xr newfs 8 ,
|
|
for example, increasing the block size to 32K or 64K may improve
|
|
performance dramatically.
|
|
As well, changing the cylinders-per-group
|
|
parameter from 16 to 32 or higher is often not only necessary for
|
|
larger file systems, but may also have positive performance implications.
|
|
.Ss Summary
|
|
Despite the length of this man-page, configuring a RAID set is a
|
|
relatively straight-forward process.
|
|
All that needs to be done is the following steps:
|
|
.Bl -enum
|
|
.It
|
|
Use
|
|
.Xr disklabel 8
|
|
to create the components (of type RAID).
|
|
.It
|
|
Construct a RAID configuration file: e.g.,
|
|
.Pa raid0.conf
|
|
.It
|
|
Configure the RAID set with:
|
|
.Bd -literal -offset indent
|
|
raidctl -C raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Initialize the component labels with:
|
|
.Bd -literal -offset indent
|
|
raidctl -I 123456 raid0
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Initialize other important parts of the set with:
|
|
.Bd -literal -offset indent
|
|
raidctl -i raid0
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Get the default label for the RAID set:
|
|
.Bd -literal -offset indent
|
|
disklabel raid0 \*[Gt] /tmp/label
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Edit the label:
|
|
.Bd -literal -offset indent
|
|
vi /tmp/label
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Put the new label on the RAID set:
|
|
.Bd -literal -offset indent
|
|
disklabel -R -r raid0 /tmp/label
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Create the file system:
|
|
.Bd -literal -offset indent
|
|
newfs /dev/rraid0e
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Mount the file system:
|
|
.Bd -literal -offset indent
|
|
mount /dev/raid0e /mnt
|
|
.Ed
|
|
.Pp
|
|
.It
|
|
Use:
|
|
.Bd -literal -offset indent
|
|
raidctl -c raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
To re-configure the RAID set the next time it is needed, or put
|
|
.Pa raid0.conf
|
|
into
|
|
.Pa /etc
|
|
where it will automatically be started by the
|
|
.Pa /etc/rc.d
|
|
scripts.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr ccd 4 ,
|
|
.Xr raid 4 ,
|
|
.Xr rc 8
|
|
.Sh HISTORY
|
|
RAIDframe is a framework for rapid prototyping of RAID structures
|
|
developed by the folks at the Parallel Data Laboratory at Carnegie
|
|
Mellon University (CMU).
|
|
A more complete description of the internals and functionality of
|
|
RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
|
|
for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
|
|
Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
|
|
Parallel Data Laboratory of Carnegie Mellon University.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
command first appeared as a program in CMU's RAIDframe v1.1 distribution.
|
|
This version of
|
|
.Nm
|
|
is a complete re-write, and first appeared in
|
|
.Nx 1.4 .
|
|
.Sh COPYRIGHT
|
|
.Bd -literal
|
|
The RAIDframe Copyright is as follows:
|
|
|
|
Copyright (c) 1994-1996 Carnegie-Mellon University.
|
|
All rights reserved.
|
|
|
|
Permission to use, copy, modify and distribute this software and
|
|
its documentation is hereby granted, provided that both the copyright
|
|
notice and this permission notice appear in all copies of the
|
|
software, derivative works or modified versions, and any portions
|
|
thereof, and that both notices appear in supporting documentation.
|
|
|
|
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
|
|
Carnegie Mellon requests users of this software to return to
|
|
|
|
Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
School of Computer Science
|
|
Carnegie Mellon University
|
|
Pittsburgh PA 15213-3890
|
|
|
|
any improvements or extensions that they make and grant Carnegie the
|
|
rights to redistribute these changes.
|
|
.Ed
|
|
.Sh WARNINGS
|
|
Certain RAID levels (1, 4, 5, 6, and others) can protect against some
|
|
data loss due to component failure.
|
|
However the loss of two components of a RAID 4 or 5 system,
|
|
or the loss of a single component of a RAID 0 system will
|
|
result in the entire file system being lost.
|
|
RAID is
|
|
.Em NOT
|
|
a substitute for good backup practices.
|
|
.Pp
|
|
Recomputation of parity
|
|
.Em MUST
|
|
be performed whenever there is a chance that it may have been compromised.
|
|
This includes after system crashes, or before a RAID
|
|
device has been used for the first time.
|
|
Failure to keep parity correct will be catastrophic should a
|
|
component ever fail \(em it is better to use RAID 0 and get the
|
|
additional space and speed, than it is to use parity, but
|
|
not keep the parity correct.
|
|
At least with RAID 0 there is no perception of increased data security.
|
|
.Sh BUGS
|
|
Hot-spare removal is currently not available.
|