f2b04ca083
convert several raidframe ioctls to be bitsize idempotent so that they work the same in 32 and 64 bit worlds, allowing netbsd32 to configure and query raid properly. remove useless 'row' in a few places. add COMPAT_80 and put the old ioctls there. raidframeio.h: RAIDFRAME_TEST_ACC - remove, unused RAIDFRAME_GET_COMPONENT_LABEL - convert to label not pointer to label RAIDFRAME_CHECK_RECON_STATUS_EXT RAIDFRAME_CHECK_PARITYREWRITE_STATUS_EXT RAIDFRAME_CHECK_COPYBACK_STATUS_EXT - convert to progress info not pointer to info RAIDFRAME_GET_INFO - version entirely. raidframevar.h: - rf_recon_req{} has row, flags and raidPtr removed (they're not a useful part of this interface.) - RF_Config_s{} and RF_DeviceConfig_s{} have numRow/rows removed. - RF_RaidDisk_s{} is re-ordered slightly to fix alignment padding - the actual data was already OK. - InstallSpareTable() loses row argument rf_compat32.c has code for RF_Config_s{} in 32 bit mode, used by RAIDFRAME_CONFIGURE and RAIDFRAME_GET_INFO32. rf_compat80.c has code for rf_recon_req{}, RF_RaidDisk_s{} and RF_DeviceConfig_s{} to handle RAIDFRAME_FAIL_DISK, RAIDFRAME_GET_COMPONENT_LABEL, RAIDFRAME_CHECK_RECON_STATUS_EXT, RAIDFRAME_CHECK_PARITYREWRITE_STATUS_EXT, RAIDFRAME_CHECK_COPYBACK_STATUS_EXT, RAIDFRAME_GET_INFO. move several of the per-ioctl code blocks into separate functions. add rf_recon_req_internal{} to replace old usage of global rf_recon_req{} that had unused void * in the structure, ruining it's 32/64 bit ABI. add missing case for RAIDFRAME_GET_INFO50. adjust raid tests to use the new .conf format, and add a case to test the old method as well. raidctl: deal with lack of 'row' members in a couple of places. fail request no longer takes row. handle "START array" sections with just "numCol numSpare", ie no "numRow" specified. for now, generate old-style configuration but update raidctl.8 to specify the new style (keeping reference to the old style.) note that: RF_ComponentLabel_s::{row,num_rows} and RF_SingleComponent_s::row are obsolete but not removed yet.
1634 lines
47 KiB
Groff
1634 lines
47 KiB
Groff
.\" $NetBSD: raidctl.8,v 1.74 2018/01/18 00:32:49 mrg Exp $
|
|
.\"
|
|
.\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
.\" by Greg Oster
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.\"
|
|
.\" Copyright (c) 1995 Carnegie-Mellon University.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Author: Mark Holland
|
|
.\"
|
|
.\" Permission to use, copy, modify and distribute this software and
|
|
.\" its documentation is hereby granted, provided that both the copyright
|
|
.\" notice and this permission notice appear in all copies of the
|
|
.\" software, derivative works or modified versions, and any portions
|
|
.\" thereof, and that both notices appear in supporting documentation.
|
|
.\"
|
|
.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
.\"
|
|
.\" Carnegie Mellon requests users of this software to return to
|
|
.\"
|
|
.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
.\" School of Computer Science
|
|
.\" Carnegie Mellon University
|
|
.\" Pittsburgh PA 15213-3890
|
|
.\"
|
|
.\" any improvements or extensions that they make and grant Carnegie the
|
|
.\" rights to redistribute these changes.
|
|
.\"
|
|
.Dd January 6, 2016
|
|
.Dt RAIDCTL 8
|
|
.Os
|
|
.Sh NAME
|
|
.Nm raidctl
|
|
.Nd configuration utility for the RAIDframe disk driver
|
|
.Sh SYNOPSIS
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl A Op yes | no | forceroot | softroot
|
|
.Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl a Ar component Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl B Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl C Ar config_file Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl c Ar config_file Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl F Ar component Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl f Ar component Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl G Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl g Ar component Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl I Ar serial_number Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl i Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl M
|
|
.Oo yes | no | set
|
|
.Ar params
|
|
.Oc
|
|
.Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl m Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl P Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl p Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl R Ar component Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl r Ar component Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl S Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl s Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl U Ar unit Ar dev
|
|
.Nm
|
|
.Op Fl v
|
|
.Fl u Ar dev
|
|
.Sh DESCRIPTION
|
|
.Nm
|
|
is the user-land control program for
|
|
.Xr raid 4 ,
|
|
the RAIDframe disk device.
|
|
.Nm
|
|
is primarily used to dynamically configure and unconfigure RAIDframe disk
|
|
devices.
|
|
For more information about the RAIDframe disk device, see
|
|
.Xr raid 4 .
|
|
.Pp
|
|
This document assumes the reader has at least rudimentary knowledge of
|
|
RAID and RAID concepts.
|
|
.Pp
|
|
The command-line options for
|
|
.Nm
|
|
are as follows:
|
|
.Bl -tag -width indent
|
|
.It Fl A Ic yes Ar dev
|
|
Make the RAID set auto-configurable.
|
|
The RAID set will be automatically configured at boot
|
|
.Ar before
|
|
the root file system is mounted.
|
|
Note that all components of the set must be of type
|
|
.Dv RAID
|
|
in the disklabel.
|
|
.It Fl A Ic no Ar dev
|
|
Turn off auto-configuration for the RAID set.
|
|
.It Fl A Ic forceroot Ar dev
|
|
Make the RAID set auto-configurable, and also mark the set as being
|
|
eligible to be the root partition.
|
|
A RAID set configured this way will
|
|
.Ar override
|
|
the use of the boot disk as the root device.
|
|
All components of the set must be of type
|
|
.Dv RAID
|
|
in the disklabel.
|
|
Note that only certain architectures
|
|
.Pq currently alpha, amd64, i386, pmax, sandpoint, sparc, sparc64, and vax
|
|
support booting a kernel directly from a RAID set.
|
|
Please note that
|
|
.Ic forceroot
|
|
mode was referred to as
|
|
.Ic root
|
|
mode on earlier versions of
|
|
.Nx .
|
|
For compatibility reasons,
|
|
.Ic root
|
|
can be used as an alias for
|
|
.Ic forceroot .
|
|
.It Fl A Ic softroot Ar dev
|
|
Like
|
|
.Ic forceroot ,
|
|
but only change the root device if the boot device is part of the RAID set.
|
|
.It Fl a Ar component Ar dev
|
|
Add
|
|
.Ar component
|
|
as a hot spare for the device
|
|
.Ar dev .
|
|
Component labels (which identify the location of a given
|
|
component within a particular RAID set) are automatically added to the
|
|
hot spare after it has been used and are not required for
|
|
.Ar component
|
|
before it is used.
|
|
.It Fl B Ar dev
|
|
Initiate a copyback of reconstructed data from a spare disk to
|
|
its original disk.
|
|
This is performed after a component has failed,
|
|
and the failed drive has been reconstructed onto a spare drive.
|
|
.It Fl C Ar config_file Ar dev
|
|
As for
|
|
.Fl c ,
|
|
but forces the configuration to take place.
|
|
Fatal errors due to uninitialized components are ignored.
|
|
This is required the first time a RAID set is configured.
|
|
.It Fl c Ar config_file Ar dev
|
|
Configure the RAIDframe device
|
|
.Ar dev
|
|
according to the configuration given in
|
|
.Ar config_file .
|
|
A description of the contents of
|
|
.Ar config_file
|
|
is given later.
|
|
.It Fl F Ar component Ar dev
|
|
Fails the specified
|
|
.Ar component
|
|
of the device, and immediately begin a reconstruction of the failed
|
|
disk onto an available hot spare.
|
|
This is one of the mechanisms used to start
|
|
the reconstruction process if a component does have a hardware failure.
|
|
.It Fl f Ar component Ar dev
|
|
This marks the specified
|
|
.Ar component
|
|
as having failed, but does not initiate a reconstruction of that component.
|
|
.It Fl G Ar dev
|
|
Generate the configuration of the RAIDframe device in a format suitable for
|
|
use with the
|
|
.Fl c
|
|
or
|
|
.Fl C
|
|
options.
|
|
.It Fl g Ar component Ar dev
|
|
Get the component label for the specified component.
|
|
.It Fl I Ar serial_number Ar dev
|
|
Initialize the component labels on each component of the device.
|
|
.Ar serial_number
|
|
is used as one of the keys in determining whether a
|
|
particular set of components belong to the same RAID set.
|
|
While not strictly enforced, different serial numbers should be used for
|
|
different RAID sets.
|
|
This step
|
|
.Em MUST
|
|
be performed when a new RAID set is created.
|
|
.It Fl i Ar dev
|
|
Initialize the RAID device.
|
|
In particular, (re-)write the parity on the selected device.
|
|
This
|
|
.Em MUST
|
|
be done for
|
|
.Em all
|
|
RAID sets before the RAID device is labeled and before
|
|
file systems are created on the RAID device.
|
|
.It Fl M Ic yes Ar dev
|
|
.\"XXX should there be a section with more info on the parity map feature?
|
|
Enable the use of a parity map on the RAID set; this is the default,
|
|
and greatly reduces the time taken to check parity after unclean
|
|
shutdowns at the cost of some very slight overhead during normal
|
|
operation.
|
|
Changes to this setting will take effect the next time the set is
|
|
configured.
|
|
Note that RAID-0 sets, having no parity, will not use a parity map in
|
|
any case.
|
|
.It Fl M Ic no Ar dev
|
|
Disable the use of a parity map on the RAID set; doing this is not
|
|
recommended.
|
|
This will take effect the next time the set is configured.
|
|
.It Fl M Ic set Ar cooldown Ar tickms Ar regions Ar dev
|
|
Alter the parameters of the parity map; parameters to leave unchanged
|
|
can be given as 0, and trailing zeroes may be omitted.
|
|
.\"XXX should this explanation be deferred to another section as well?
|
|
The RAID set is divided into
|
|
.Ar regions
|
|
regions; each region is marked dirty for at most
|
|
.Ar cooldown
|
|
intervals of
|
|
.Ar tickms
|
|
milliseconds each after a write to it, and at least
|
|
.Ar cooldown
|
|
\- 1 such intervals.
|
|
Changes to
|
|
.Ar regions
|
|
take effect the next time is configured, while changes to the other
|
|
parameters are applied immediately.
|
|
The default parameters are expected to be reasonable for most workloads.
|
|
.It Fl m Ar dev
|
|
Display status information about the parity map on the RAID set, if any.
|
|
If used with
|
|
.Fl v
|
|
then the current contents of the parity map will be output (in
|
|
hexadecimal format) as well.
|
|
.It Fl P Ar dev
|
|
Check the status of the parity on the RAID set, and initialize
|
|
(re-write) the parity if the parity is not known to be up-to-date.
|
|
This is normally used after a system crash (and before a
|
|
.Xr fsck 8 )
|
|
to ensure the integrity of the parity.
|
|
.It Fl p Ar dev
|
|
Check the status of the parity on the RAID set.
|
|
Displays a status message,
|
|
and returns successfully if the parity is up-to-date.
|
|
.It Fl R Ar component Ar dev
|
|
Fails the specified
|
|
.Ar component ,
|
|
if necessary, and immediately begins a reconstruction back to
|
|
.Ar component .
|
|
This is useful for reconstructing back onto a component after
|
|
it has been replaced following a failure.
|
|
.It Fl r Ar component Ar dev
|
|
Remove the spare disk specified by
|
|
.Ar component
|
|
from the set of available spare components.
|
|
.It Fl S Ar dev
|
|
Check the status of parity re-writing, component reconstruction, and
|
|
component copyback.
|
|
The output indicates the amount of progress
|
|
achieved in each of these areas.
|
|
.It Fl s Ar dev
|
|
Display the status of the RAIDframe device for each of the components
|
|
and spares.
|
|
.It Fl U Ar unit Ar dev
|
|
Set the
|
|
.Dv last_unit
|
|
field in all the raid components, so that the next time the raid
|
|
will be autoconfigured it uses that
|
|
.Ar unit .
|
|
.It Fl u Ar dev
|
|
Unconfigure the RAIDframe device.
|
|
This does not remove any component labels or change any configuration
|
|
settings (e.g. auto-configuration settings) for the RAID set.
|
|
.It Fl v
|
|
Be more verbose.
|
|
For operations such as reconstructions, parity
|
|
re-writing, and copybacks, provide a progress indicator.
|
|
.El
|
|
.Pp
|
|
The device used by
|
|
.Nm
|
|
is specified by
|
|
.Ar dev .
|
|
.Ar dev
|
|
may be either the full name of the device, e.g.,
|
|
.Pa /dev/rraid0d ,
|
|
for the i386 architecture, or
|
|
.Pa /dev/rraid0c
|
|
for many others, or just simply
|
|
.Pa raid0
|
|
(for
|
|
.Pa /dev/rraid0[cd] ) .
|
|
It is recommended that the partitions used to represent the
|
|
RAID device are not used for file systems.
|
|
.Ss Configuration file
|
|
The format of the configuration file is complex, and
|
|
only an abbreviated treatment is given here.
|
|
In the configuration files, a
|
|
.Sq #
|
|
indicates the beginning of a comment.
|
|
.Pp
|
|
There are 4 required sections of a configuration file, and 2
|
|
optional sections.
|
|
Each section begins with a
|
|
.Sq START ,
|
|
followed by the section name,
|
|
and the configuration parameters associated with that section.
|
|
The first section is the
|
|
.Sq array
|
|
section, and it specifies
|
|
the number of columns, and spare disks in the RAID set.
|
|
For example:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
3 0
|
|
.Ed
|
|
.Pp
|
|
indicates an array with 3 columns, and 0 spare disks.
|
|
Old configurations specified a 3rd value in front of the
|
|
number of columns and spare disks.
|
|
This old value, if provided, must be specified as 1:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
1 3 0
|
|
.Ed
|
|
.Pp
|
|
The second section, the
|
|
.Sq disks
|
|
section, specifies the actual components of the device.
|
|
For example:
|
|
.Bd -literal -offset indent
|
|
START disks
|
|
/dev/sd0e
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
.Ed
|
|
.Pp
|
|
specifies the three component disks to be used in the RAID device.
|
|
Disk wedges may also be specified with the NAME=<wedge name> syntax.
|
|
If any of the specified drives cannot be found when the RAID device is
|
|
configured, then they will be marked as
|
|
.Sq failed ,
|
|
and the system will operate in degraded mode.
|
|
Note that it is
|
|
.Em imperative
|
|
that the order of the components in the configuration file does not
|
|
change between configurations of a RAID device.
|
|
Changing the order of the components will result in data loss
|
|
if the set is configured with the
|
|
.Fl C
|
|
option.
|
|
In normal circumstances, the RAID set will not configure if only
|
|
.Fl c
|
|
is specified, and the components are out-of-order.
|
|
.Pp
|
|
The next section, which is the
|
|
.Sq spare
|
|
section, is optional, and, if present, specifies the devices to be used as
|
|
.Sq hot spares
|
|
\(em devices which are on-line,
|
|
but are not actively used by the RAID driver unless
|
|
one of the main components fail.
|
|
A simple
|
|
.Sq spare
|
|
section might be:
|
|
.Bd -literal -offset indent
|
|
START spare
|
|
/dev/sd3e
|
|
.Ed
|
|
.Pp
|
|
for a configuration with a single spare component.
|
|
If no spare drives are to be used in the configuration, then the
|
|
.Sq spare
|
|
section may be omitted.
|
|
.Pp
|
|
The next section is the
|
|
.Sq layout
|
|
section.
|
|
This section describes the general layout parameters for the RAID device,
|
|
and provides such information as
|
|
sectors per stripe unit,
|
|
stripe units per parity unit,
|
|
stripe units per reconstruction unit,
|
|
and the parity configuration to use.
|
|
This section might look like:
|
|
.Bd -literal -offset indent
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
|
|
32 1 1 5
|
|
.Ed
|
|
.Pp
|
|
The sectors per stripe unit specifies, in blocks, the interleave
|
|
factor; i.e., the number of contiguous sectors to be written to each
|
|
component for a single stripe.
|
|
Appropriate selection of this value (32 in this example)
|
|
is the subject of much research in RAID architectures.
|
|
The stripe units per parity unit and
|
|
stripe units per reconstruction unit are normally each set to 1.
|
|
While certain values above 1 are permitted, a discussion of valid
|
|
values and the consequences of using anything other than 1 are outside
|
|
the scope of this document.
|
|
The last value in this section (5 in this example)
|
|
indicates the parity configuration desired.
|
|
Valid entries include:
|
|
.Bl -tag -width inde
|
|
.It 0
|
|
RAID level 0.
|
|
No parity, only simple striping.
|
|
.It 1
|
|
RAID level 1.
|
|
Mirroring.
|
|
The parity is the mirror.
|
|
.It 4
|
|
RAID level 4.
|
|
Striping across components, with parity stored on the last component.
|
|
.It 5
|
|
RAID level 5.
|
|
Striping across components, parity distributed across all components.
|
|
.El
|
|
.Pp
|
|
There are other valid entries here, including those for Even-Odd
|
|
parity, RAID level 5 with rotated sparing, Chained declustering,
|
|
and Interleaved declustering, but as of this writing the code for
|
|
those parity operations has not been tested with
|
|
.Nx .
|
|
.Pp
|
|
The next required section is the
|
|
.Sq queue
|
|
section.
|
|
This is most often specified as:
|
|
.Bd -literal -offset indent
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
where the queuing method is specified as fifo (first-in, first-out),
|
|
and the size of the per-component queue is limited to 100 requests.
|
|
Other queuing methods may also be specified, but a discussion of them
|
|
is beyond the scope of this document.
|
|
.Pp
|
|
The final section, the
|
|
.Sq debug
|
|
section, is optional.
|
|
For more details on this the reader is referred to
|
|
the RAIDframe documentation discussed in the
|
|
.Sx HISTORY
|
|
section.
|
|
.Pp
|
|
See
|
|
.Sx EXAMPLES
|
|
for a more complete configuration file example.
|
|
.Sh FILES
|
|
.Bl -tag -width /dev/XXrXraidX -compact
|
|
.It Pa /dev/{,r}raid*
|
|
.Cm raid
|
|
device special files.
|
|
.El
|
|
.Sh EXAMPLES
|
|
It is highly recommended that before using the RAID driver for real
|
|
file systems that the system administrator(s) become quite familiar
|
|
with the use of
|
|
.Nm ,
|
|
and that they understand how the component reconstruction process works.
|
|
The examples in this section will focus on configuring a
|
|
number of different RAID sets of varying degrees of redundancy.
|
|
By working through these examples, administrators should be able to
|
|
develop a good feel for how to configure a RAID set, and how to
|
|
initiate reconstruction of failed components.
|
|
.Pp
|
|
In the following examples
|
|
.Sq raid0
|
|
will be used to denote the RAID device.
|
|
Depending on the architecture,
|
|
.Pa /dev/rraid0c
|
|
or
|
|
.Pa /dev/rraid0d
|
|
may be used in place of
|
|
.Pa raid0 .
|
|
.Ss Initialization and Configuration
|
|
The initial step in configuring a RAID set is to identify the components
|
|
that will be used in the RAID set.
|
|
All components should be the same size.
|
|
Each component should have a disklabel type of
|
|
.Dv FS_RAID ,
|
|
and a typical disklabel entry for a RAID component might look like:
|
|
.Bd -literal -offset indent
|
|
f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
|
|
.Ed
|
|
.Pp
|
|
While
|
|
.Dv FS_BSDFFS
|
|
will also work as the component type, the type
|
|
.Dv FS_RAID
|
|
is preferred for RAIDframe use, as it is required for features such as
|
|
auto-configuration.
|
|
As part of the initial configuration of each RAID set,
|
|
each component will be given a
|
|
.Sq component label .
|
|
A
|
|
.Sq component label
|
|
contains important information about the component, including a
|
|
user-specified serial number, the column of that component in
|
|
the RAID set, the redundancy level of the RAID set, a
|
|
.Sq modification counter ,
|
|
and whether the parity information (if any) on that
|
|
component is known to be correct.
|
|
Component labels are an integral part of the RAID set,
|
|
since they are used to ensure that components
|
|
are configured in the correct order, and used to keep track of other
|
|
vital information about the RAID set.
|
|
Component labels are also required for the auto-detection
|
|
and auto-configuration of RAID sets at boot time.
|
|
For a component label to be considered valid, that
|
|
particular component label must be in agreement with the other
|
|
component labels in the set.
|
|
For example, the serial number,
|
|
.Sq modification counter ,
|
|
and number of columns must all be in agreement.
|
|
If any of these are different, then the component is
|
|
not considered to be part of the set.
|
|
See
|
|
.Xr raid 4
|
|
for more information about component labels.
|
|
.Pp
|
|
Once the components have been identified, and the disks have
|
|
appropriate labels,
|
|
.Nm
|
|
is then used to configure the
|
|
.Xr raid 4
|
|
device.
|
|
To configure the device, a configuration file which looks something like:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numCol numSpare
|
|
3 1
|
|
|
|
START disks
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
|
|
START spare
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
32 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
is created in a file.
|
|
The above configuration file specifies a RAID 5
|
|
set consisting of the components
|
|
.Pa /dev/sd1e ,
|
|
.Pa /dev/sd2e ,
|
|
and
|
|
.Pa /dev/sd3e ,
|
|
with
|
|
.Pa /dev/sd4e
|
|
available as a
|
|
.Sq hot spare
|
|
in case one of the three main drives should fail.
|
|
A RAID 0 set would be specified in a similar way:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numCol numSpare
|
|
4 0
|
|
|
|
START disks
|
|
/dev/sd10e
|
|
/dev/sd11e
|
|
/dev/sd12e
|
|
/dev/sd13e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
|
|
64 1 1 0
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
In this case, devices
|
|
.Pa /dev/sd10e ,
|
|
.Pa /dev/sd11e ,
|
|
.Pa /dev/sd12e ,
|
|
and
|
|
.Pa /dev/sd13e
|
|
are the components that make up this RAID set.
|
|
Note that there are no hot spares for a RAID 0 set,
|
|
since there is no way to recover data if any of the components fail.
|
|
.Pp
|
|
For a RAID 1 (mirror) set, the following configuration might be used:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numCol numSpare
|
|
2 0
|
|
|
|
START disks
|
|
/dev/sd20e
|
|
/dev/sd21e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
|
|
128 1 1 1
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
In this case,
|
|
.Pa /dev/sd20e
|
|
and
|
|
.Pa /dev/sd21e
|
|
are the two components of the mirror set.
|
|
While no hot spares have been specified in this
|
|
configuration, they easily could be, just as they were specified in
|
|
the RAID 5 case above.
|
|
Note as well that RAID 1 sets are currently limited to only 2 components.
|
|
At present, n-way mirroring is not possible.
|
|
.Pp
|
|
The first time a RAID set is configured, the
|
|
.Fl C
|
|
option must be used:
|
|
.Bd -literal -offset indent
|
|
raidctl -C raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
where
|
|
.Pa raid0.conf
|
|
is the name of the RAID configuration file.
|
|
The
|
|
.Fl C
|
|
forces the configuration to succeed, even if any of the component
|
|
labels are incorrect.
|
|
The
|
|
.Fl C
|
|
option should not be used lightly in
|
|
situations other than initial configurations, as if
|
|
the system is refusing to configure a RAID set, there is probably a
|
|
very good reason for it.
|
|
After the initial configuration is done (and
|
|
appropriate component labels are added with the
|
|
.Fl I
|
|
option) then raid0 can be configured normally with:
|
|
.Bd -literal -offset indent
|
|
raidctl -c raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
When the RAID set is configured for the first time, it is
|
|
necessary to initialize the component labels, and to initialize the
|
|
parity on the RAID set.
|
|
Initializing the component labels is done with:
|
|
.Bd -literal -offset indent
|
|
raidctl -I 112341 raid0
|
|
.Ed
|
|
.Pp
|
|
where
|
|
.Sq 112341
|
|
is a user-specified serial number for the RAID set.
|
|
This initialization step is
|
|
.Em required
|
|
for all RAID sets.
|
|
As well, using different serial numbers between RAID sets is
|
|
.Em strongly encouraged ,
|
|
as using the same serial number for all RAID sets will only serve to
|
|
decrease the usefulness of the component label checking.
|
|
.Pp
|
|
Initializing the RAID set is done via the
|
|
.Fl i
|
|
option.
|
|
This initialization
|
|
.Em MUST
|
|
be done for
|
|
.Em all
|
|
RAID sets, since among other things it verifies that the parity (if
|
|
any) on the RAID set is correct.
|
|
Since this initialization may be quite time-consuming, the
|
|
.Fl v
|
|
option may be also used in conjunction with
|
|
.Fl i :
|
|
.Bd -literal -offset indent
|
|
raidctl -iv raid0
|
|
.Ed
|
|
.Pp
|
|
This will give more verbose output on the
|
|
status of the initialization:
|
|
.Bd -literal -offset indent
|
|
Initiating re-write of parity
|
|
Parity Re-write status:
|
|
10% |**** | ETA: 06:03 /
|
|
.Ed
|
|
.Pp
|
|
The output provides a
|
|
.Sq Percent Complete
|
|
in both a numeric and graphical format, as well as an estimated time
|
|
to completion of the operation.
|
|
.Pp
|
|
Since it is the parity that provides the
|
|
.Sq redundancy
|
|
part of RAID, it is critical that the parity is correct as much as possible.
|
|
If the parity is not correct, then there is no
|
|
guarantee that data will not be lost if a component fails.
|
|
.Pp
|
|
Once the parity is known to be correct, it is then safe to perform
|
|
.Xr disklabel 8 ,
|
|
.Xr newfs 8 ,
|
|
or
|
|
.Xr fsck 8
|
|
on the device or its file systems, and then to mount the file systems
|
|
for use.
|
|
.Pp
|
|
Under certain circumstances (e.g., the additional component has not
|
|
arrived, or data is being migrated off of a disk destined to become a
|
|
component) it may be desirable to configure a RAID 1 set with only
|
|
a single component.
|
|
This can be achieved by using the word
|
|
.Dq absent
|
|
to indicate that a particular component is not present.
|
|
In the following:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numCol numSpare
|
|
2 0
|
|
|
|
START disks
|
|
absent
|
|
/dev/sd0e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
|
|
128 1 1 1
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
.Pa /dev/sd0e
|
|
is the real component, and will be the second disk of a RAID 1 set.
|
|
The first component is simply marked as being absent.
|
|
Configuration (using
|
|
.Fl C
|
|
and
|
|
.Fl I Ar 12345
|
|
as above) proceeds normally, but initialization of the RAID set will
|
|
have to wait until all physical components are present.
|
|
After configuration, this set can be used normally, but will be operating
|
|
in degraded mode.
|
|
Once a second physical component is obtained, it can be hot-added,
|
|
the existing data mirrored, and normal operation resumed.
|
|
.Pp
|
|
The size of the resulting RAID set will depend on the number of data
|
|
components in the set.
|
|
Space is automatically reserved for the component labels, and
|
|
the actual amount of space used
|
|
for data on a component will be rounded down to the largest possible
|
|
multiple of the sectors per stripe unit (sectPerSU) value.
|
|
Thus, the amount of space provided by the RAID set will be less
|
|
than the sum of the size of the components.
|
|
.Ss Maintenance of the RAID set
|
|
After the parity has been initialized for the first time, the command:
|
|
.Bd -literal -offset indent
|
|
raidctl -p raid0
|
|
.Ed
|
|
.Pp
|
|
can be used to check the current status of the parity.
|
|
To check the parity and rebuild it necessary (for example,
|
|
after an unclean shutdown) the command:
|
|
.Bd -literal -offset indent
|
|
raidctl -P raid0
|
|
.Ed
|
|
.Pp
|
|
is used.
|
|
Note that re-writing the parity can be done while
|
|
other operations on the RAID set are taking place (e.g., while doing a
|
|
.Xr fsck 8
|
|
on a file system on the RAID set).
|
|
However: for maximum effectiveness of the RAID set, the parity should be
|
|
known to be correct before any data on the set is modified.
|
|
.Pp
|
|
To see how the RAID set is doing, the following command can be used to
|
|
show the RAID set's status:
|
|
.Bd -literal -offset indent
|
|
raidctl -s raid0
|
|
.Ed
|
|
.Pp
|
|
The output will look something like:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
Component label for /dev/sd1e:
|
|
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Component label for /dev/sd2e:
|
|
Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Component label for /dev/sd3e:
|
|
Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
Parity status: clean
|
|
Reconstruction is 100% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
This indicates that all is well with the RAID set.
|
|
Of importance here are the component lines which read
|
|
.Sq optimal ,
|
|
and the
|
|
.Sq Parity status
|
|
line.
|
|
.Sq Parity status: clean
|
|
indicates that the parity is up-to-date for this RAID set,
|
|
whether or not the RAID set is in redundant or degraded mode.
|
|
.Sq Parity status: DIRTY
|
|
indicates that it is not known if the parity information is
|
|
consistent with the data, and that the parity information needs
|
|
to be checked.
|
|
Note that if there are file systems open on the RAID set,
|
|
the individual components will not be
|
|
.Sq clean
|
|
but the set as a whole can still be clean.
|
|
.Pp
|
|
To check the component label of
|
|
.Pa /dev/sd1e ,
|
|
the following is used:
|
|
.Bd -literal -offset indent
|
|
raidctl -g /dev/sd1e raid0
|
|
.Ed
|
|
.Pp
|
|
The output of this command will look something like:
|
|
.Bd -literal -offset indent
|
|
Component label for /dev/sd1e:
|
|
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
|
|
Version: 2 Serial Number: 13432 Mod Counter: 65
|
|
Clean: No Status: 0
|
|
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
|
|
RAID Level: 5 blocksize: 512 numBlocks: 1799936
|
|
Autoconfig: No
|
|
Last configured as: raid0
|
|
.Ed
|
|
.Ss Dealing with Component Failures
|
|
If for some reason
|
|
(perhaps to test reconstruction) it is necessary to pretend a drive
|
|
has failed, the following will perform that function:
|
|
.Bd -literal -offset indent
|
|
raidctl -f /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
The system will then be performing all operations in degraded mode,
|
|
where missing data is re-computed from existing data and the parity.
|
|
In this case, obtaining the status of raid0 will return (in part):
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
Note that with the use of
|
|
.Fl f
|
|
a reconstruction has not been started.
|
|
To both fail the disk and start a reconstruction, the
|
|
.Fl F
|
|
option must be used:
|
|
.Bd -literal -offset indent
|
|
raidctl -F /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Fl f
|
|
option may be used first, and then the
|
|
.Fl F
|
|
option used later, on the same disk, if desired.
|
|
Immediately after the reconstruction is started, the status will report:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: reconstructing
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: used_spare
|
|
[...]
|
|
Parity status: clean
|
|
Reconstruction is 10% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
This indicates that a reconstruction is in progress.
|
|
To find out how the reconstruction is progressing the
|
|
.Fl S
|
|
option may be used.
|
|
This will indicate the progress in terms of the
|
|
percentage of the reconstruction that is completed.
|
|
When the reconstruction is finished the
|
|
.Fl s
|
|
option will show:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: spared
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: used_spare
|
|
[...]
|
|
Parity status: clean
|
|
Reconstruction is 100% complete.
|
|
Parity Re-write is 100% complete.
|
|
Copyback is 100% complete.
|
|
.Ed
|
|
.Pp
|
|
At this point there are at least two options.
|
|
First, if
|
|
.Pa /dev/sd2e
|
|
is known to be good (i.e., the failure was either caused by
|
|
.Fl f
|
|
or
|
|
.Fl F ,
|
|
or the failed disk was replaced), then a copyback of the data can
|
|
be initiated with the
|
|
.Fl B
|
|
option.
|
|
In this example, this would copy the entire contents of
|
|
.Pa /dev/sd4e
|
|
to
|
|
.Pa /dev/sd2e .
|
|
Once the copyback procedure is complete, the
|
|
status of the device would be (in part):
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
and the system is back to normal operation.
|
|
.Pp
|
|
The second option after the reconstruction is to simply use
|
|
.Pa /dev/sd4e
|
|
in place of
|
|
.Pa /dev/sd2e
|
|
in the configuration file.
|
|
For example, the configuration file (in part) might now look like:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
3 0
|
|
|
|
START disks
|
|
/dev/sd1e
|
|
/dev/sd4e
|
|
/dev/sd3e
|
|
.Ed
|
|
.Pp
|
|
This can be done as
|
|
.Pa /dev/sd4e
|
|
is completely interchangeable with
|
|
.Pa /dev/sd2e
|
|
at this point.
|
|
Note that extreme care must be taken when
|
|
changing the order of the drives in a configuration.
|
|
This is one of the few instances where the devices and/or
|
|
their orderings can be changed without loss of data!
|
|
In general, the ordering of components in a configuration file should
|
|
.Em never
|
|
be changed.
|
|
.Pp
|
|
If a component fails and there are no hot spares
|
|
available on-line, the status of the RAID set might (in part) look like:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
In this case there are a number of options.
|
|
The first option is to add a hot spare using:
|
|
.Bd -literal -offset indent
|
|
raidctl -a /dev/sd4e raid0
|
|
.Ed
|
|
.Pp
|
|
After the hot add, the status would then be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: failed
|
|
/dev/sd3e: optimal
|
|
Spares:
|
|
/dev/sd4e: spare
|
|
.Ed
|
|
.Pp
|
|
Reconstruction could then take place using
|
|
.Fl F
|
|
as describe above.
|
|
.Pp
|
|
A second option is to rebuild directly onto
|
|
.Pa /dev/sd2e .
|
|
Once the disk containing
|
|
.Pa /dev/sd2e
|
|
has been replaced, one can simply use:
|
|
.Bd -literal -offset indent
|
|
raidctl -R /dev/sd2e raid0
|
|
.Ed
|
|
.Pp
|
|
to rebuild the
|
|
.Pa /dev/sd2e
|
|
component.
|
|
As the rebuilding is in progress, the status will be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: reconstructing
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
and when completed, will be:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd1e: optimal
|
|
/dev/sd2e: optimal
|
|
/dev/sd3e: optimal
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
In circumstances where a particular component is completely
|
|
unavailable after a reboot, a special component name will be used to
|
|
indicate the missing component.
|
|
For example:
|
|
.Bd -literal -offset indent
|
|
Components:
|
|
/dev/sd2e: optimal
|
|
component1: failed
|
|
No spares.
|
|
.Ed
|
|
.Pp
|
|
indicates that the second component of this RAID set was not detected
|
|
at all by the auto-configuration code.
|
|
The name
|
|
.Sq component1
|
|
can be used anywhere a normal component name would be used.
|
|
For example, to add a hot spare to the above set, and rebuild to that hot
|
|
spare, the following could be done:
|
|
.Bd -literal -offset indent
|
|
raidctl -a /dev/sd3e raid0
|
|
raidctl -F component1 raid0
|
|
.Ed
|
|
.Pp
|
|
at which point the data missing from
|
|
.Sq component1
|
|
would be reconstructed onto
|
|
.Pa /dev/sd3e .
|
|
.Pp
|
|
When more than one component is marked as
|
|
.Sq failed
|
|
due to a non-component hardware failure (e.g., loss of power to two
|
|
components, adapter problems, termination problems, or cabling issues) it
|
|
is quite possible to recover the data on the RAID set.
|
|
The first thing to be aware of is that the first disk to fail will
|
|
almost certainly be out-of-sync with the remainder of the array.
|
|
If any IO was performed between the time the first component is considered
|
|
.Sq failed
|
|
and when the second component is considered
|
|
.Sq failed ,
|
|
then the first component to fail will
|
|
.Em not
|
|
contain correct data, and should be ignored.
|
|
When the second component is marked as failed, however, the RAID device will
|
|
(currently) panic the system.
|
|
At this point the data on the RAID set
|
|
(not including the first failed component) is still self consistent,
|
|
and will be in no worse state of repair than had the power gone out in
|
|
the middle of a write to a file system on a non-RAID device.
|
|
The problem, however, is that the component labels may now have 3 different
|
|
.Sq modification counters
|
|
(one value on the first component that failed, one value on the second
|
|
component that failed, and a third value on the remaining components).
|
|
In such a situation, the RAID set will not autoconfigure,
|
|
and can only be forcibly re-configured
|
|
with the
|
|
.Fl C
|
|
option.
|
|
To recover the RAID set, one must first remedy whatever physical
|
|
problem caused the multiple-component failure.
|
|
After that is done, the RAID set can be restored by forcibly
|
|
configuring the raid set
|
|
.Em without
|
|
the component that failed first.
|
|
For example, if
|
|
.Pa /dev/sd1e
|
|
and
|
|
.Pa /dev/sd2e
|
|
fail (in that order) in a RAID set of the following configuration:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
4 0
|
|
|
|
START disks
|
|
/dev/sd1e
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
64 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
|
|
.Ed
|
|
.Pp
|
|
then the following configuration (say "recover_raid0.conf")
|
|
.Bd -literal -offset indent
|
|
START array
|
|
4 0
|
|
|
|
START disks
|
|
absent
|
|
/dev/sd2e
|
|
/dev/sd3e
|
|
/dev/sd4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
|
|
64 1 1 5
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
can be used with
|
|
.Bd -literal -offset indent
|
|
raidctl -C recover_raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
to force the configuration of raid0.
|
|
A
|
|
.Bd -literal -offset indent
|
|
raidctl -I 12345 raid0
|
|
.Ed
|
|
.Pp
|
|
will be required in order to synchronize the component labels.
|
|
At this point the file systems on the RAID set can then be checked and
|
|
corrected.
|
|
To complete the re-construction of the RAID set,
|
|
.Pa /dev/sd1e
|
|
is simply hot-added back into the array, and reconstructed
|
|
as described earlier.
|
|
.Ss RAID on RAID
|
|
RAID sets can be layered to create more complex and much larger RAID sets.
|
|
A RAID 0 set, for example, could be constructed from four RAID 5 sets.
|
|
The following configuration file shows such a setup:
|
|
.Bd -literal -offset indent
|
|
START array
|
|
# numCol numSpare
|
|
4 0
|
|
|
|
START disks
|
|
/dev/raid1e
|
|
/dev/raid2e
|
|
/dev/raid3e
|
|
/dev/raid4e
|
|
|
|
START layout
|
|
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
|
|
128 1 1 0
|
|
|
|
START queue
|
|
fifo 100
|
|
.Ed
|
|
.Pp
|
|
A similar configuration file might be used for a RAID 0 set
|
|
constructed from components on RAID 1 sets.
|
|
In such a configuration, the mirroring provides a high degree
|
|
of redundancy, while the striping provides additional speed benefits.
|
|
.Ss Auto-configuration and Root on RAID
|
|
RAID sets can also be auto-configured at boot.
|
|
To make a set auto-configurable,
|
|
simply prepare the RAID set as above, and then do a:
|
|
.Bd -literal -offset indent
|
|
raidctl -A yes raid0
|
|
.Ed
|
|
.Pp
|
|
to turn on auto-configuration for that set.
|
|
To turn off auto-configuration, use:
|
|
.Bd -literal -offset indent
|
|
raidctl -A no raid0
|
|
.Ed
|
|
.Pp
|
|
RAID sets which are auto-configurable will be configured before the
|
|
root file system is mounted.
|
|
These RAID sets are thus available for
|
|
use as a root file system, or for any other file system.
|
|
A primary advantage of using the auto-configuration is that RAID components
|
|
become more independent of the disks they reside on.
|
|
For example, SCSI ID's can change, but auto-configured sets will always be
|
|
configured correctly, even if the SCSI ID's of the component disks
|
|
have become scrambled.
|
|
.Pp
|
|
Having a system's root file system
|
|
.Pq Pa /
|
|
on a RAID set is also allowed, with the
|
|
.Sq a
|
|
partition of such a RAID set being used for
|
|
.Pa / .
|
|
To use raid0a as the root file system, simply use:
|
|
.Bd -literal -offset indent
|
|
raidctl -A forceroot raid0
|
|
.Ed
|
|
.Pp
|
|
To return raid0a to be just an auto-configuring set simply use the
|
|
.Fl A Ar yes
|
|
arguments.
|
|
.Pp
|
|
Note that kernels can only be directly read from RAID 1 components on
|
|
architectures that support that
|
|
.Pq currently alpha, i386, pmax, sandpoint, sparc, sparc64, and vax .
|
|
On those architectures, the
|
|
.Dv FS_RAID
|
|
file system is recognized by the bootblocks, and will properly load the
|
|
kernel directly from a RAID 1 component.
|
|
For other architectures, or to support the root file system
|
|
on other RAID sets, some other mechanism must be used to get a kernel booting.
|
|
For example, a small partition containing only the secondary boot-blocks
|
|
and an alternate kernel (or two) could be used.
|
|
Once a kernel is booting however, and an auto-configuring RAID set is
|
|
found that is eligible to be root, then that RAID set will be
|
|
auto-configured and used as the root device.
|
|
If two or more RAID sets claim to be root devices, then the
|
|
user will be prompted to select the root device.
|
|
At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices.
|
|
.Pp
|
|
A typical RAID 1 setup with root on RAID might be as follows:
|
|
.Bl -enum
|
|
.It
|
|
wd0a - a small partition, which contains a complete, bootable, basic
|
|
.Nx
|
|
installation.
|
|
.It
|
|
wd1a - also contains a complete, bootable, basic
|
|
.Nx
|
|
installation.
|
|
.It
|
|
wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
|
|
.It
|
|
wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
|
|
swap space.
|
|
.It
|
|
wd0g and wd1g - a RAID 1 set, raid2, used for
|
|
.Pa /usr ,
|
|
.Pa /home ,
|
|
or other data, if desired.
|
|
.It
|
|
wd0h and wd1h - a RAID 1 set, raid3, if desired.
|
|
.El
|
|
.Pp
|
|
RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
|
|
raid0 is marked as being a root file system.
|
|
When new kernels are installed, the kernel is not only copied to
|
|
.Pa / ,
|
|
but also to wd0a and wd1a.
|
|
The kernel on wd0a is required, since that
|
|
is the kernel the system boots from.
|
|
The kernel on wd1a is also
|
|
required, since that will be the kernel used should wd0 fail.
|
|
The important point here is to have redundant copies of the kernel
|
|
available, in the event that one of the drives fail.
|
|
.Pp
|
|
There is no requirement that the root file system be on the same disk
|
|
as the kernel.
|
|
For example, obtaining the kernel from wd0a, and using
|
|
sd0e and sd1e for raid0, and the root file system, is fine.
|
|
It
|
|
.Em is
|
|
critical, however, that there be multiple kernels available, in the
|
|
event of media failure.
|
|
.Pp
|
|
Multi-layered RAID devices (such as a RAID 0 set made
|
|
up of RAID 1 sets) are
|
|
.Em not
|
|
supported as root devices or auto-configurable devices at this point.
|
|
(Multi-layered RAID devices
|
|
.Em are
|
|
supported in general, however, as mentioned earlier.)
|
|
Note that in order to enable component auto-detection and
|
|
auto-configuration of RAID devices, the line:
|
|
.Bd -literal -offset indent
|
|
options RAID_AUTOCONFIG
|
|
.Ed
|
|
.Pp
|
|
must be in the kernel configuration file.
|
|
See
|
|
.Xr raid 4
|
|
for more details.
|
|
.Ss Swapping on RAID
|
|
A RAID device can be used as a swap device.
|
|
In order to ensure that a RAID device used as a swap device
|
|
is correctly unconfigured when the system is shutdown or rebooted,
|
|
it is recommended that the line
|
|
.Bd -literal -offset indent
|
|
swapoff=YES
|
|
.Ed
|
|
.Pp
|
|
be added to
|
|
.Pa /etc/rc.conf .
|
|
.Ss Unconfiguration
|
|
The final operation performed by
|
|
.Nm
|
|
is to unconfigure a
|
|
.Xr raid 4
|
|
device.
|
|
This is accomplished via a simple:
|
|
.Bd -literal -offset indent
|
|
raidctl -u raid0
|
|
.Ed
|
|
.Pp
|
|
at which point the device is ready to be reconfigured.
|
|
.Ss Performance Tuning
|
|
Selection of the various parameter values which result in the best
|
|
performance can be quite tricky, and often requires a bit of
|
|
trial-and-error to get those values most appropriate for a given system.
|
|
A whole range of factors come into play, including:
|
|
.Bl -enum
|
|
.It
|
|
Types of components (e.g., SCSI vs. IDE) and their bandwidth
|
|
.It
|
|
Types of controller cards and their bandwidth
|
|
.It
|
|
Distribution of components among controllers
|
|
.It
|
|
IO bandwidth
|
|
.It
|
|
file system access patterns
|
|
.It
|
|
CPU speed
|
|
.El
|
|
.Pp
|
|
As with most performance tuning, benchmarking under real-life loads
|
|
may be the only way to measure expected performance.
|
|
Understanding some of the underlying technology is also useful in tuning.
|
|
The goal of this section is to provide pointers to those parameters which may
|
|
make significant differences in performance.
|
|
.Pp
|
|
For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
|
|
Since data in a RAID 1 set is arranged in a linear
|
|
fashion on each component, selecting an appropriate stripe size is
|
|
somewhat less critical than it is for a RAID 5 set.
|
|
However: a stripe size that is too small will cause large IO's to be
|
|
broken up into a number of smaller ones, hurting performance.
|
|
At the same time, a large stripe size may cause problems with
|
|
concurrent accesses to stripes, which may also affect performance.
|
|
Thus values in the range of 32 to 128 are often the most effective.
|
|
.Pp
|
|
Tuning RAID 5 sets is trickier.
|
|
In the best case, IO is presented to the RAID set one stripe at a time.
|
|
Since the entire stripe is available at the beginning of the IO,
|
|
the parity of that stripe can be calculated before the stripe is written,
|
|
and then the stripe data and parity can be written in parallel.
|
|
When the amount of data being written is less than a full stripe worth, the
|
|
.Sq small write
|
|
problem occurs.
|
|
Since a
|
|
.Sq small write
|
|
means only a portion of the stripe on the components is going to
|
|
change, the data (and parity) on the components must be updated
|
|
slightly differently.
|
|
First, the
|
|
.Sq old parity
|
|
and
|
|
.Sq old data
|
|
must be read from the components.
|
|
Then the new parity is constructed,
|
|
using the new data to be written, and the old data and old parity.
|
|
Finally, the new data and new parity are written.
|
|
All this extra data shuffling results in a serious loss of performance,
|
|
and is typically 2 to 4 times slower than a full stripe write (or read).
|
|
To combat this problem in the real world, it may be useful
|
|
to ensure that stripe sizes are small enough that a
|
|
.Sq large IO
|
|
from the system will use exactly one large stripe write.
|
|
As is seen later, there are some file system dependencies
|
|
which may come into play here as well.
|
|
.Pp
|
|
Since the size of a
|
|
.Sq large IO
|
|
is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
|
|
be desirable to select a SectPerSU value of 16 blocks (8K) or 32
|
|
blocks (16K).
|
|
Since there are 4 data sectors per stripe, the maximum
|
|
data per stripe is 64 blocks (32K) or 128 blocks (64K).
|
|
Again, empirical measurement will provide the best indicators of which
|
|
values will yield better performance.
|
|
.Pp
|
|
The parameters used for the file system are also critical to good performance.
|
|
For
|
|
.Xr newfs 8 ,
|
|
for example, increasing the block size to 32K or 64K may improve
|
|
performance dramatically.
|
|
As well, changing the cylinders-per-group
|
|
parameter from 16 to 32 or higher is often not only necessary for
|
|
larger file systems, but may also have positive performance implications.
|
|
.Ss Summary
|
|
Despite the length of this man-page, configuring a RAID set is a
|
|
relatively straight-forward process.
|
|
All that needs to be done is the following steps:
|
|
.Bl -enum
|
|
.It
|
|
Use
|
|
.Xr disklabel 8
|
|
to create the components (of type RAID).
|
|
.It
|
|
Construct a RAID configuration file: e.g.,
|
|
.Pa raid0.conf
|
|
.It
|
|
Configure the RAID set with:
|
|
.Bd -literal -offset indent
|
|
raidctl -C raid0.conf raid0
|
|
.Ed
|
|
.It
|
|
Initialize the component labels with:
|
|
.Bd -literal -offset indent
|
|
raidctl -I 123456 raid0
|
|
.Ed
|
|
.It
|
|
Initialize other important parts of the set with:
|
|
.Bd -literal -offset indent
|
|
raidctl -i raid0
|
|
.Ed
|
|
.It
|
|
Get the default label for the RAID set:
|
|
.Bd -literal -offset indent
|
|
disklabel raid0 > /tmp/label
|
|
.Ed
|
|
.It
|
|
Edit the label:
|
|
.Bd -literal -offset indent
|
|
vi /tmp/label
|
|
.Ed
|
|
.It
|
|
Put the new label on the RAID set:
|
|
.Bd -literal -offset indent
|
|
disklabel -R -r raid0 /tmp/label
|
|
.Ed
|
|
.It
|
|
Create the file system:
|
|
.Bd -literal -offset indent
|
|
newfs /dev/rraid0e
|
|
.Ed
|
|
.It
|
|
Mount the file system:
|
|
.Bd -literal -offset indent
|
|
mount /dev/raid0e /mnt
|
|
.Ed
|
|
.It
|
|
Use:
|
|
.Bd -literal -offset indent
|
|
raidctl -c raid0.conf raid0
|
|
.Ed
|
|
.Pp
|
|
To re-configure the RAID set the next time it is needed, or put
|
|
.Pa raid0.conf
|
|
into
|
|
.Pa /etc
|
|
where it will automatically be started by the
|
|
.Pa /etc/rc.d
|
|
scripts.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr ccd 4 ,
|
|
.Xr raid 4 ,
|
|
.Xr rc 8
|
|
.Sh HISTORY
|
|
RAIDframe is a framework for rapid prototyping of RAID structures
|
|
developed by the folks at the Parallel Data Laboratory at Carnegie
|
|
Mellon University (CMU).
|
|
A more complete description of the internals and functionality of
|
|
RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
|
|
for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
|
|
Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
|
|
Parallel Data Laboratory of Carnegie Mellon University.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
command first appeared as a program in CMU's RAIDframe v1.1 distribution.
|
|
This version of
|
|
.Nm
|
|
is a complete re-write, and first appeared in
|
|
.Nx 1.4 .
|
|
.Sh COPYRIGHT
|
|
.Bd -literal
|
|
The RAIDframe Copyright is as follows:
|
|
|
|
Copyright (c) 1994-1996 Carnegie-Mellon University.
|
|
All rights reserved.
|
|
|
|
Permission to use, copy, modify and distribute this software and
|
|
its documentation is hereby granted, provided that both the copyright
|
|
notice and this permission notice appear in all copies of the
|
|
software, derivative works or modified versions, and any portions
|
|
thereof, and that both notices appear in supporting documentation.
|
|
|
|
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
|
|
Carnegie Mellon requests users of this software to return to
|
|
|
|
Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
School of Computer Science
|
|
Carnegie Mellon University
|
|
Pittsburgh PA 15213-3890
|
|
|
|
any improvements or extensions that they make and grant Carnegie the
|
|
rights to redistribute these changes.
|
|
.Ed
|
|
.Sh WARNINGS
|
|
Certain RAID levels (1, 4, 5, 6, and others) can protect against some
|
|
data loss due to component failure.
|
|
However the loss of two components of a RAID 4 or 5 system,
|
|
or the loss of a single component of a RAID 0 system will
|
|
result in the entire file system being lost.
|
|
RAID is
|
|
.Em NOT
|
|
a substitute for good backup practices.
|
|
.Pp
|
|
Recomputation of parity
|
|
.Em MUST
|
|
be performed whenever there is a chance that it may have been compromised.
|
|
This includes after system crashes, or before a RAID
|
|
device has been used for the first time.
|
|
Failure to keep parity correct will be catastrophic should a
|
|
component ever fail \(em it is better to use RAID 0 and get the
|
|
additional space and speed, than it is to use parity, but
|
|
not keep the parity correct.
|
|
At least with RAID 0 there is no perception of increased data security.
|
|
.Pp
|
|
When replacing a failed component of a RAID set, it is a good
|
|
idea to zero out the first 64 blocks of the new component to insure the
|
|
RAIDframe driver doesn't erroneously detect a component label in the
|
|
new component.
|
|
This is particularly true on
|
|
.Em RAID 1
|
|
sets because there is at most one correct component label in a failed RAID
|
|
1 installation, and the RAIDframe driver picks the component label with the
|
|
highest serial number and modification value as the authoritative source
|
|
for the failed RAID set when choosing which component label to use to
|
|
configure the RAID set.
|
|
.Sh BUGS
|
|
Hot-spare removal is currently not available.
|