359 lines
14 KiB
Groff
359 lines
14 KiB
Groff
.\" $NetBSD: raid.4,v 1.13 2000/02/26 19:55:01 oster Exp $
|
|
.\"
|
|
.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
.\" by Greg Oster
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
.\" must display the following acknowledgement:
|
|
.\" This product includes software developed by the NetBSD
|
|
.\" Foundation, Inc. and its contributors.
|
|
.\" 4. Neither the name of The NetBSD Foundation nor the names of its
|
|
.\" contributors may be used to endorse or promote products derived
|
|
.\" from this software without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.\"
|
|
.\" Copyright (c) 1995 Carnegie-Mellon University.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Author: Mark Holland
|
|
.\"
|
|
.\" Permission to use, copy, modify and distribute this software and
|
|
.\" its documentation is hereby granted, provided that both the copyright
|
|
.\" notice and this permission notice appear in all copies of the
|
|
.\" software, derivative works or modified versions, and any portions
|
|
.\" thereof, and that both notices appear in supporting documentation.
|
|
.\"
|
|
.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
.\"
|
|
.\" Carnegie Mellon requests users of this software to return to
|
|
.\"
|
|
.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
.\" School of Computer Science
|
|
.\" Carnegie Mellon University
|
|
.\" Pittsburgh PA 15213-3890
|
|
.\"
|
|
.\" any improvements or extensions that they make and grant Carnegie the
|
|
.\" rights to redistribute these changes.
|
|
.\"
|
|
.Dd November 9, 1998
|
|
.Dt RAID 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm raid
|
|
.Nd RAIDframe disk driver
|
|
.Sh SYNOPSIS
|
|
.Cd "pseudo-device raid" Op Ar count
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm
|
|
driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to NetBSD. This
|
|
document assumes that the reader has at least some familiarity with RAID
|
|
and RAID concepts. The reader is also assumed to know how to configure
|
|
disks and pseudo-devices into kernels, how to generate kernels, and how
|
|
to partition disks.
|
|
.Pp
|
|
RAIDframe provides a number of different RAID levels including:
|
|
.Bl -tag -width indent
|
|
.It RAID 0
|
|
provides simple data striping across the components.
|
|
.It RAID 1
|
|
provides mirroring.
|
|
.It RAID 4
|
|
provides data striping across the components, with parity
|
|
stored on a dedicated drive (in this case, the last component).
|
|
.It RAID 5
|
|
provides data striping across the components, with parity
|
|
distributed across all the components.
|
|
.El
|
|
.Pp
|
|
There are a wide variety of other RAID levels supported by RAIDframe,
|
|
including Even-Odd parity, RAID level 5 with rotated sparing, Chained
|
|
declustering, and Interleaved declustering. The reader is referred
|
|
to the RAIDframe documentation mentioned in the
|
|
.Sx HISTORY
|
|
section for more detail on these various RAID configurations.
|
|
.Pp
|
|
Depending on the parity level configured, the device driver can
|
|
support the failure of component drives. The number of failures
|
|
allowed depends on the parity level selected. If the driver is able
|
|
to handle drive failures, and a drive does fail, then the system is
|
|
operating in "degraded mode". In this mode, all missing data must be
|
|
reconstructed from the data and parity present on the other
|
|
components. This results in much slower data accesses, but
|
|
does mean that a failure need not bring the system to a complete halt.
|
|
.Pp
|
|
The RAID driver supports and enforces the use of
|
|
.Sq component labels .
|
|
A
|
|
.Sq component label
|
|
contains important information about the component, including a
|
|
user-specified serial number, the row and column of that component in
|
|
the RAID set, and whether the data (and parity) on the component is
|
|
.Sq clean .
|
|
If the driver determines that the labels are very inconsistent with
|
|
respect to each other (e.g. two or more serial numbers do not match)
|
|
or that the component label is not consistent with it's assigned place
|
|
in the set (e.g. the component label claims the component should be
|
|
the 3rd one a 6-disk set, but the RAID set has it as the 3rd component
|
|
in a 5-disk set) then the device will fail to configure. If the
|
|
driver determines that exactly one component label seems to be
|
|
incorrect, and the RAID set is being configured as a set that supports
|
|
a single failure, then the RAID set will be allowed to configure, but
|
|
the incorrectly labeled component will be marked as
|
|
.Sq failed ,
|
|
and the RAID set will begin operation in degraded mode.
|
|
If all of the components are consistent among themselves, the RAID set
|
|
will configure normally.
|
|
.Pp
|
|
Component labels are also used to support the auto-detection and
|
|
auto-configuration of RAID sets. A RAID set can be flagged as
|
|
auto-configurable, in which case it will be configured automatically
|
|
during the kernel boot process. RAID filesystems which are
|
|
automatically configured are also eligible to be the root filesystem.
|
|
While there is no support for booting directly from a RAID set, it is
|
|
possible to boot from a small partition which contains a kernel, and
|
|
have the root filesystem on a RAID set. See
|
|
.Xr raidctl 8
|
|
for more information on auto-configuration of RAID sets.
|
|
.Pp
|
|
The driver supports
|
|
.Sq hot spares ,
|
|
disks which are on-line, but are not
|
|
actively used in an existing filesystem. Should a disk fail, the
|
|
driver is capable of reconstructing the failed disk onto a hot spare
|
|
or back onto a replacement drive.
|
|
If the components are hot swapable, the failed disk can then be
|
|
removed, a new disk put in its place, and a copyback operation
|
|
performed. The copyback operation, as its name indicates, will copy
|
|
the reconstructed data from the hot spare to the previously failed
|
|
(and now replaced) disk. Hot spares can also be hot-added using
|
|
.Xr raidctl 8 .
|
|
.Pp
|
|
If a component cannot be detected when the RAID device is configured,
|
|
that component will be simply marked as 'failed'.
|
|
.Pp
|
|
The user-land utility for doing all
|
|
.Nm
|
|
configuration and other operations
|
|
is
|
|
.Xr raidctl 8 .
|
|
Most importantly,
|
|
.Xr raidctl 8
|
|
must be used with the
|
|
.Fl i
|
|
option to initialize all RAID sets. In particular, this
|
|
initialization includes re-building the parity data. This rebuilding
|
|
of parity data is also required when either a) a new RAID device is
|
|
brought up for the first time or b) after an un-clean shutdown of a
|
|
RAID device. By using the
|
|
.Fl P
|
|
option to
|
|
.Xr raidctl 8 ,
|
|
and performing this on-demand recomputation of all parity
|
|
before doing a
|
|
.Xr fsck 8
|
|
or a
|
|
.Xr newfs 8 ,
|
|
filesystem integrity and parity integrity can be ensured. It bears
|
|
repeating again that parity recomputation is
|
|
.Ar required
|
|
before any filesystems are created or used on the RAID device. If the
|
|
parity is not correct, then missing data cannot be correctly recovered.
|
|
.Pp
|
|
RAID levels may be combined in a hierarchical fashion. For example, a RAID 0
|
|
device can be constructed out of a number of RAID 5 devices (which, in turn,
|
|
may be constructed out of the physical disks, or of other RAID devices).
|
|
.Pp
|
|
It is important that drives be hard-coded at their respective
|
|
addresses (i.e. not left free-floating, where a drive with SCSI ID of
|
|
4 can end up as /dev/sd0c) for well-behaved functioning of the RAID
|
|
device. This is true for all types of drives, including IDE, HP-IB,
|
|
etc. For normal SCSI drives, for example, the following can be used
|
|
to fix the device addresses:
|
|
.Bd -unfilled -offset indent
|
|
sd0 at scsibus0 target 0 lun ? # SCSI disk drives
|
|
sd1 at scsibus0 target 1 lun ? # SCSI disk drives
|
|
sd2 at scsibus0 target 2 lun ? # SCSI disk drives
|
|
sd3 at scsibus0 target 3 lun ? # SCSI disk drives
|
|
sd4 at scsibus0 target 4 lun ? # SCSI disk drives
|
|
sd5 at scsibus0 target 5 lun ? # SCSI disk drives
|
|
sd6 at scsibus0 target 6 lun ? # SCSI disk drives
|
|
.Ed
|
|
.Pp
|
|
See
|
|
.Xr sd 4
|
|
for more information. The rationale for fixing the device addresses
|
|
is as follows: Consider a system with three SCSI drives at SCSI ID's
|
|
4, 5, and 6, and which map to components /dev/sd0e, /dev/sd1e, and
|
|
/dev/sd2e of a RAID 5 set. If the drive with SCSI ID 5 fails, and the
|
|
system reboots, the old /dev/sd2e will show up as /dev/sd1e. The RAID
|
|
driver is able to detect that component positions have changed, and
|
|
will not allow normal configuration. If the device addresses are hard
|
|
coded, however, the RAID driver would detect that the middle component
|
|
is unavailable, and bring the RAID 5 set up in degraded mode. Note
|
|
that the auto-detection and auto-configuration code does not care
|
|
about where the components live. The auto-configuration code will
|
|
correctly configure a device even after any number of the components
|
|
have been re-arranged.
|
|
.Pp
|
|
The first step to using the
|
|
.Nm
|
|
driver is to ensure that it is suitably configured in the kernel. This is
|
|
done by adding a line similar to:
|
|
.Bd -unfilled -offset indent
|
|
pseudo-device raid 4 # RAIDframe disk device
|
|
.Ed
|
|
.Pp
|
|
to the kernel configuration file. The
|
|
.Sq count
|
|
argument (
|
|
.Sq 4 ,
|
|
in this case), specifies the number of RAIDframe drivers to configure.
|
|
To turn on component auto-detection and auto-configuration of RAID
|
|
sets, simply add:
|
|
.Bd -unfilled -offset indent
|
|
options RAID_AUTOCONFIG
|
|
.Ed
|
|
.Pp
|
|
to the kernel configuration file.
|
|
.Pp
|
|
In all cases the
|
|
.Sq raw
|
|
partitions of the disks
|
|
.Pa must not
|
|
be combined. Rather, each component partition should be offset by at least one
|
|
cylinder from the beginning of that component disk. This ensures that
|
|
the disklabels for the component disks do not conflict with the
|
|
disklabel for the
|
|
.Nm
|
|
device.
|
|
As well, all component partitions must be of the type
|
|
.Dv FS_BSDFFS
|
|
(e.g. 4.2BSD) or
|
|
.Dv FS_RAID .
|
|
The use of the latter is strongly encouraged, and is required if
|
|
auto-configuration of the RAID set is desired.
|
|
.Pp
|
|
A more detailed treatment of actually using a
|
|
.Nm
|
|
device is found in
|
|
.Xr raidctl 8 .
|
|
It is highly recommended that the steps to reconstruct, copyback, and
|
|
re-compute parity are well understood by the system administrator(s)
|
|
.Ar before
|
|
a component failure. Doing the wrong thing when a component fails may
|
|
result in data loss.
|
|
.Pp
|
|
.Sh WARNINGS
|
|
Certain RAID levels (1, 4, 5, 6, and others) can protect against some
|
|
data loss due to component failure. However the loss of two
|
|
components of a RAID 4 or 5 system, or the loss of a single component
|
|
of a RAID 0 system, will result in the entire filesystems on that RAID
|
|
device being lost.
|
|
RAID is
|
|
.Ar NOT
|
|
a substitute for good backup practices.
|
|
.Pp
|
|
Recomputation of parity
|
|
.Ar MUST
|
|
be performed whenever there is a chance that it may have been
|
|
compromised. This includes after system crashes, or before a RAID
|
|
device has been used for the first time. Failure to keep parity
|
|
correct will be catastrophic should a component ever fail -- it is
|
|
better to use RAID 0 and get the additional space and speed, than it
|
|
is to use parity, but not keep the parity correct. At least with RAID
|
|
0 there is no perception of increased data security.
|
|
.Pp
|
|
.Sh FILES
|
|
.Bl -tag -width /dev/XXrXraidX -compact
|
|
.It Pa /dev/{,r}raid*
|
|
.Nm
|
|
device special files.
|
|
.El
|
|
.Pp
|
|
.Sh SEE ALSO
|
|
.Xr MAKEDEV 8 ,
|
|
.Xr raidctl 8 ,
|
|
.Xr config 8 ,
|
|
.Xr fsck 8 ,
|
|
.Xr mount 8 ,
|
|
.Xr newfs 8 ,
|
|
.Xr sd 4
|
|
.Sh HISTORY
|
|
The
|
|
.Nm
|
|
driver in
|
|
.Nx
|
|
is a port of RAIDframe, a framework for rapid prototyping of RAID
|
|
structures developed by the folks at the Parallel Data Laboratory at
|
|
Carnegie Mellon University (CMU). RAIDframe, as originally distributed
|
|
by CMU, provides a RAID simulator for a number of different
|
|
architectures, and a user-level device driver and a kernel device
|
|
driver for Digital Unix. The
|
|
.Nm
|
|
driver is a kernelized version of RAIDframe v1.1.
|
|
.Pp
|
|
A more complete description of the internals and functionality of
|
|
RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
|
|
for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
|
|
Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
|
|
Parallel Data Laboratory of Carnegie Mellon University.
|
|
The
|
|
.Nm
|
|
driver first appeared in
|
|
.Nx 1.4 .
|
|
.Sh COPYRIGHT
|
|
.Bd -unfilled
|
|
|
|
The RAIDframe Copyright is as follows:
|
|
|
|
Copyright (c) 1994-1996 Carnegie-Mellon University.
|
|
All rights reserved.
|
|
|
|
Permission to use, copy, modify and distribute this software and
|
|
its documentation is hereby granted, provided that both the copyright
|
|
notice and this permission notice appear in all copies of the
|
|
software, derivative works or modified versions, and any portions
|
|
thereof, and that both notices appear in supporting documentation.
|
|
|
|
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
|
|
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
|
|
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
|
|
|
|
Carnegie Mellon requests users of this software to return to
|
|
|
|
Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
|
|
School of Computer Science
|
|
Carnegie Mellon University
|
|
Pittsburgh PA 15213-3890
|
|
|
|
any improvements or extensions that they make and grant Carnegie the
|
|
rights to redistribute these changes.
|
|
|
|
.Ed
|