.\" $NetBSD: raid.4,v 1.27 2004/02/29 22:04:11 oster Exp $ .\" .\" Copyright (c) 1998 The NetBSD Foundation, Inc. .\" All rights reserved. .\" .\" This code is derived from software contributed to The NetBSD Foundation .\" by Greg Oster .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. All advertising materials mentioning features or use of this software .\" must display the following acknowledgement: .\" This product includes software developed by the NetBSD .\" Foundation, Inc. and its contributors. .\" 4. Neither the name of The NetBSD Foundation nor the names of its .\" contributors may be used to endorse or promote products derived .\" from this software without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .\" .\" .\" Copyright (c) 1995 Carnegie-Mellon University. .\" All rights reserved. .\" .\" Author: Mark Holland .\" .\" Permission to use, copy, modify and distribute this software and .\" its documentation is hereby granted, provided that both the copyright .\" notice and this permission notice appear in all copies of the .\" software, derivative works or modified versions, and any portions .\" thereof, and that both notices appear in supporting documentation. .\" .\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" .\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND .\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. .\" .\" Carnegie Mellon requests users of this software to return to .\" .\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU .\" School of Computer Science .\" Carnegie Mellon University .\" Pittsburgh PA 15213-3890 .\" .\" any improvements or extensions that they make and grant Carnegie the .\" rights to redistribute these changes. .\" .Dd February 29, 2004 .Dt RAID 4 .Os .Sh NAME .Nm raid .Nd RAIDframe disk driver .Sh SYNOPSIS .Cd options RAID_AUTOCONFIG .Cd options RAID_DIAGNOSTIC .Cd options RF_ACC_TRACE=n .Cd options RF_DEBUG_MAP=n .Cd options RF_DEBUG_PSS=n .Cd options RF_DEBUG_QUEUE=n .Cd options RF_DEBUG_QUIESCE=n .Cd options RF_DEBUG_STRIPELOCK=n .Cd options RF_DEBUG_RECON=n .Cd options RF_DEBUG_VALIDATE_DAG=n .Cd options RF_DEBUG_VERIFYPARITY=n .Cd options RF_INCLUDE_EVENODD=n .Cd options RF_INCLUDE_RAID5_RS=n .Cd options RF_INCLUDE_PARITYLOGGING=n .Cd options RF_INCLUDE_CHAINDECLUSTER=n .Cd options RF_INCLUDE_INTERDECLUSTER=n .Cd options RF_INCLUDE_PARITY_DECLUSTERING=n .Cd options RF_INCLUDE_PARITY_DECLUSTERING_DS=n .Pp .Cd "pseudo-device raid" Op Ar count .Sh DESCRIPTION The .Nm driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to .Nx . This document assumes that the reader has at least some familiarity with RAID and RAID concepts. The reader is also assumed to know how to configure disks and pseudo-devices into kernels, how to generate kernels, and how to partition disks. .Pp RAIDframe provides a number of different RAID levels including: .Bl -tag -width indent .It RAID 0 provides simple data striping across the components. .It RAID 1 provides mirroring. .It RAID 4 provides data striping across the components, with parity stored on a dedicated drive (in this case, the last component). .It RAID 5 provides data striping across the components, with parity distributed across all the components. .El .Pp There are a wide variety of other RAID levels supported by RAIDframe. The configuration file options to enable them are briefly outlined at the end of this section. .Pp Depending on the parity level configured, the device driver can support the failure of component drives. The number of failures allowed depends on the parity level selected. If the driver is able to handle drive failures, and a drive does fail, then the system is operating in "degraded mode". In this mode, all missing data must be reconstructed from the data and parity present on the other components. This results in much slower data accesses, but does mean that a failure need not bring the system to a complete halt. .Pp The RAID driver supports and enforces the use of .Sq component labels . A .Sq component label contains important information about the component, including a user-specified serial number, the row and column of that component in the RAID set, and whether the data (and parity) on the component is .Sq clean . If the driver determines that the labels are very inconsistent with respect to each other (e.g. two or more serial numbers do not match) or that the component label is not consistent with its assigned place in the set (e.g. the component label claims the component should be the 3rd one in a 6-disk set, but the RAID set has it as the 3rd component in a 5-disk set) then the device will fail to configure. If the driver determines that exactly one component label seems to be incorrect, and the RAID set is being configured as a set that supports a single failure, then the RAID set will be allowed to configure, but the incorrectly labeled component will be marked as .Sq failed , and the RAID set will begin operation in degraded mode. If all of the components are consistent among themselves, the RAID set will configure normally. .Pp Component labels are also used to support the auto-detection and auto-configuration of RAID sets. A RAID set can be flagged as auto-configurable, in which case it will be configured automatically during the kernel boot process. RAID file systems which are automatically configured are also eligible to be the root file system. There is currently only limited support (alpha and pmax architectures) for booting a kernel directly from a RAID 1 set, and no support for booting from any other RAID sets. To use a RAID set as the root file system, a kernel is usually obtained from a small non-RAID partition, after which any auto-configuring RAID set can be used for the root file system. See .Xr raidctl 8 for more information on auto-configuration of RAID sets. Note that with auto-configuration of RAID sets, it is no longer necessary to hard-code SCSI IDs of drives. The auto-configuration code will correctly configure a device even after any number of the components have had their device IDs changed or device names changed. .Pp The driver supports .Sq hot spares , disks which are on-line, but are not actively used in an existing file system. Should a disk fail, the driver is capable of reconstructing the failed disk onto a hot spare or back onto a replacement drive. If the components are hot swappable, the failed disk can then be removed, a new disk put in its place, and a copyback operation performed. The copyback operation, as its name indicates, will copy the reconstructed data from the hot spare to the previously failed (and now replaced) disk. Hot spares can also be hot-added using .Xr raidctl 8 . .Pp If a component cannot be detected when the RAID device is configured, that component will be simply marked as 'failed'. .Pp The user-land utility for doing all .Nm configuration and other operations is .Xr raidctl 8 . Most importantly, .Xr raidctl 8 must be used with the .Fl i option to initialize all RAID sets. In particular, this initialization includes re-building the parity data. This rebuilding of parity data is also required when either a) a new RAID device is brought up for the first time or b) after an un-clean shutdown of a RAID device. By using the .Fl P option to .Xr raidctl 8 , and performing this on-demand recomputation of all parity before doing a .Xr fsck 8 or a .Xr newfs 8 , file system integrity and parity integrity can be ensured. It bears repeating again that parity recomputation is .Ar required before any file systems are created or used on the RAID device. If the parity is not correct, then missing data cannot be correctly recovered. .Pp RAID levels may be combined in a hierarchical fashion. For example, a RAID 0 device can be constructed out of a number of RAID 5 devices (which, in turn, may be constructed out of the physical disks, or of other RAID devices). .Pp The first step to using the .Nm driver is to ensure that it is suitably configured in the kernel. This is done by adding a line similar to: .Bd -unfilled -offset indent pseudo-device raid 4 # RAIDframe disk device .Ed .Pp to the kernel configuration file. The .Sq count argument ( .Sq 4 , in this case), specifies the number of RAIDframe drivers to configure. To turn on component auto-detection and auto-configuration of RAID sets, simply add: .Bd -unfilled -offset indent options RAID_AUTOCONFIG .Ed .Pp to the kernel configuration file. .Pp All component partitions must be of the type .Dv FS_BSDFFS (e.g. 4.2BSD) or .Dv FS_RAID . The use of the latter is strongly encouraged, and is required if auto-configuration of the RAID set is desired. Since RAIDframe leaves room for disklabels, RAID components can be simply raw disks, or partitions which use an entire disk. .Pp A more detailed treatment of actually using a .Nm device is found in .Xr raidctl 8 . It is highly recommended that the steps to reconstruct, copyback, and re-compute parity are well understood by the system administrator(s) .Ar before a component failure. Doing the wrong thing when a component fails may result in data loss. .Pp Additional internal consistency checking can be enabled by specifying: .Bd -unfilled -offset indent options RAID_DIAGNOSTIC .Ed .Pp These assertions are disabled by default in order to improve performance. .Pp RAIDframe supports an access tracing facility for tracking both requests made and performance of various parts of the RAID systems as the request is processed. To enable this tracing the following option may be specified: .Bd -unfilled -offset indent options RF_ACC_TRACE=1 .Ed .Pp For extensive debugging there are a number of kernel options which will aid in performing extra diagnosis of various parts of the RAIDframe sub-systems. Note that in order to make full use of these options it is often necessary to enable one or more debugging options as listed in src/sys/dev/raidframe/rf_options.h. As well, these options are also only typically useful for people who wish to debug various parts of RAIDframe. The options include: .Pp For debugging the code which maps RAID addresses to physical addresses: .Bd -unfilled -offset indent options RF_DEBUG_MAP=1 .Ed .Pp Parity stripe status debugging is enabled with: .Bd -unfilled -offset indent options RF_DEBUG_PSS=1 .Ed .Pp Additional debugging for queueing is enabled with: .Bd -unfilled -offset indent options RF_DEBUG_QUEUE=1 .Ed .Pp Problems with non-quiescent filesystems should be easier to debug if the following is enabled: .Bd -unfilled -offset indent options RF_DEBUG_QUIESCE=1 .Ed .Pp Stripelock debugging is enabled with: .Bd -unfilled -offset indent options RF_DEBUG_STRIPELOCK=1 .Ed .Pp Additional diagnostic checks during reconstruction are enabled with: .Bd -unfilled -offset indent options RF_DEBUG_RECON=1 .Ed .Pp Validation of the DAGs (Directed Acyclic Graphs) used to describe an I/O access can be performed when the following is enabled: .Bd -unfilled -offset indent options RF_DEBUG_VALIDATE_DAG=1 .Ed .Pp Additional diagnostics during parity verification is enabled with: .Bd -unfilled -offset indent options RF_DEBUG_VERIFYPARITY=1 .Ed .Pp There are a number of less commonly used RAID levels supported by RAIDframe. These additional RAID types should be considered experimental, and may not be ready for production use. The various types and the options to enable them are shown here: .Pp For Even-Odd parity: .Bd -unfilled -offset indent options RF_INCLUDE_EVENODD=1 .Ed .Pp For RAID level 5 with rotated sparing: .Bd -unfilled -offset indent options RF_INCLUDE_RAID5_RS=1 .Ed .Pp For Parity Logging (highly experimental): .Bd -unfilled -offset indent options RF_INCLUDE_PARITYLOGGING=1 .Ed .Pp For Chain Declustering: .Bd -unfilled -offset indent options RF_INCLUDE_CHAINDECLUSTER=1 .Ed .Pp For Interleaved Declustering: .Bd -unfilled -offset indent options RF_INCLUDE_INTERDECLUSTER=1 .Ed .Pp For Parity Declustering: .Bd -unfilled -offset indent options RF_INCLUDE_PARITY_DECLUSTERING=1 .Ed .Pp For Parity Declustering with Distributed Spares: .Bd -unfilled -offset indent options RF_INCLUDE_PARITY_DECLUSTERING_DS=1 .Ed .Pp The reader is referred to the RAIDframe documentation mentioned in the .Sx HISTORY section for more detail on these various RAID configurations. .Sh WARNINGS Certain RAID levels (1, 4, 5, 6, and others) can protect against some data loss due to component failure. However the loss of two components of a RAID 4 or 5 system, or the loss of a single component of a RAID 0 system, will result in the entire file systems on that RAID device being lost. RAID is .Ar NOT a substitute for good backup practices. .Pp Recomputation of parity .Ar MUST be performed whenever there is a chance that it may have been compromised. This includes after system crashes, or before a RAID device has been used for the first time. Failure to keep parity correct will be catastrophic should a component ever fail -- it is better to use RAID 0 and get the additional space and speed, than it is to use parity, but not keep the parity correct. At least with RAID 0 there is no perception of increased data security. .Sh FILES .Bl -tag -width /dev/XXrXraidX -compact .It Pa /dev/{,r}raid* .Nm device special files. .El .Sh SEE ALSO .Xr sd 4 , .Xr MAKEDEV 8 , .Xr config 8 , .Xr fsck 8 , .Xr mount 8 , .Xr newfs 8 , .Xr raidctl 8 .Sh HISTORY The .Nm driver in .Nx is a port of RAIDframe, a framework for rapid prototyping of RAID structures developed by the folks at the Parallel Data Laboratory at Carnegie Mellon University (CMU). RAIDframe, as originally distributed by CMU, provides a RAID simulator for a number of different architectures, and a user-level device driver and a kernel device driver for Digital Unix. The .Nm driver is a kernelized version of RAIDframe v1.1. .Pp A more complete description of the internals and functionality of RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool for RAID Systems", by William V. Courtright II, Garth Gibson, Mark Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the Parallel Data Laboratory of Carnegie Mellon University. The .Nm driver first appeared in .Nx 1.4 . .Sh COPYRIGHT .Bd -unfilled The RAIDframe Copyright is as follows: .Pp Copyright (c) 1994-1996 Carnegie-Mellon University. All rights reserved. .Pp Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. .Pp CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. .Pp Carnegie Mellon requests users of this software to return to .Pp Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU School of Computer Science Carnegie Mellon University Pittsburgh PA 15213-3890 .Pp any improvements or extensions that they make and grant Carnegie the rights to redistribute these changes. .Ed