incorporating a used spare is automatic.
Copyback has always been an issue, as to do a copyback all IO to
the array had to be suspended, and so was very, very unlikely to
have been used in anything resembling a production system.
based on the usage pattern:
raidctl <device> create <level> <component1> <component2> ...
For example,
raidctl raid0 create mirror absent /dev/wd1e
will create a RAID level 1 (mirror) set with an absent first component
and /dev/wd1e as the second component. The resulting RAID device will
be marked as auto-configurable, will have a serial number set (based
on the current time), and parity will be initialized. Reasonable
performance values are automatically used by default for other parameters
normally specified in the configuration file.
Also: Only print out Autoconfig status if being verbose.
Implement a long desired feature of automatically incorporating
a used spare into the array after a reconstruct.
Given the configuration:
Components:
/dev/wd0e: failed
/dev/wd1e: optimal
/dev/wd2e: optimal
Spares:
/dev/wd3e: spare
Running 'raidctl -F /dev/wd0e raid0' will now result in the
following configuration after a successful rebuild:
Components:
/dev/wd3e: optimal
/dev/wd1e: optimal
/dev/wd2e: optimal
No spares.
Thanks to manu@ for the development of the initial set of changes
which allowed the changes to automatically incorporate a used spare
to come to fruition. Thanks also to manu@ for useful discussions
about and additional testing of these changes.
Rename compiler-warning-disable variables from
GCC_NO_warning
to
CC_WNO_warning
where warning is the full warning name as used by the compiler.
GCC_NO_IMPLICIT_FALLTHRU is CC_WNO_IMPLICIT_FALLTHROUGH
Using the convention CC_compilerflag, where compilerflag
is based on the full compiler flag name.
If getfsspecname() fails that will usually mean that a NAME=wedge or
ROOT.x partition is unabailable. raidframe specified unavailable
partitions as "absent" so in this case, pass "absent" rather than the
unaltered NAME= or ROOT.x string, which the kernel has no clue what
do do with, and doesn't configure the raid at all.
This does the same config file parse that -c/-C do, but only
that (hence no raidframe device is needed, or accepted).
Any syntax errors in the config file will be reported, nothing
else happens.
First, and what got me started on this set of cleanups, the queue
length in the "queue" section (START queue) is limited to what will
fit in a char without losing accuracy (I tried setting it to 200,
rather than the more common (universal?) 100 and found that the
value configured into the array was -56 instead.
Why the value needs to be passed through a char variable I have no
idea (it is an int in the filesystem raidframe headers) - but that's
the way it is done, and changing it would be an ABI change I believe
(and so need versioning to alter) and that isn't worth it for this
(or not now, IMO).
Instead check that the value in the char is the same value as was
read from the config file, and complain if not. Those of you with
unsigned chars will be able to have queue lengths up to 255, the
rest of us are limited to 127.
While looking at that, I noticed some code that obviously fails to
understand that scanf("%s") will never return a string containing
spaces, and proceeded to attempt to remove trailing spaces from the
result ... amusingly, after having used the result for its intended
purpose (non existent trailing spaces unremoved), after which that
buffer was never used again. That code is now gone (but for now,
just #if 0'd rather than actually deleted - it should be cleaned up
sometime).
Then I saw some other issues with how the config was parsed - a
simple (unbounded) scanf("%s") into a buffer, which hypothetically
might not be large enough (not a security issue really, raidctl has
no special privs, and it isn't likely that root could easily be
tricked into running it on a bogus config file - or not without
looking first anyway, and a huge long string would rather stand
out). Bound the string length to something reasonable, and
assert() that the buffer is big enough to contain it.
Lastly, in the event of one particular detected error in the
config file, the code would write a warning, but then just go
ahead and use the bad data (or nothing perhaps) anyway - a
failure of logic flow (unlikely to have ever happened, everyone
seems to simply copy the sample config from the man page, and
make minor adjustments as needed).
If any of these changes make any difference to anyone (except
me with my attempt to make longer queues - for no particularly
well thought out reason), I'd be very surprised.
the following comment appeared:
/*
* After NetBSD 9, convert this to not output the numRow's value,
* which is no longer required or ever used.
*/
We are after NetBSD 9 (well after). The change requested in that
comment is made here, and the comment is thus removed.
A couple of places in rf_configure.c where a value for the "rows"
parameter was output in an error message (always simply as the
constant 0) have also been updated (those messages will no longer
include "row 0", which they always said previously). One of them
was also slightly reworded to be clearer what problem it was
experiencing (when it said 'unable to get device file' it meant
it was unable to locate the name for the device in the config file,
not that it was found, and there was some other problem with it).
GCC_NO_FORMAT_TRUNCATION -Wno-format-truncation (GCC 7/8)
GCC_NO_STRINGOP_TRUNCATION -Wno-stringop-truncation (GCC 8)
GCC_NO_STRINGOP_OVERFLOW -Wno-stringop-overflow (GCC 8)
GCC_NO_CAST_FUNCTION_TYPE -Wno-cast-function-type (GCC 8)
use these to turn off warnings for most GCC-8 complaints. many
of these are false positives, most of the real bugs are already
commited, or are yet to come.
we plan to introduce versions of (some?) of these that use the
"-Wno-error=" form, which still displays the warnings but does
not make it an error, and all of the above will be re-considered
as either being "fix me" (warning still displayed) or "warning
is wrong."
- remove casts when the same type is used on both sides
- expand hours_buffer[] to fit the range of hours in an 'int'
- add a work around for the sprintf() truncation checker that fails
to detect that 'minutes' and 'seconds' have a small range
convert several raidframe ioctls to be bitsize idempotent so that
they work the same in 32 and 64 bit worlds, allowing netbsd32 to
configure and query raid properly. remove useless 'row' in a few
places. add COMPAT_80 and put the old ioctls there.
raidframeio.h:
RAIDFRAME_TEST_ACC
- remove, unused
RAIDFRAME_GET_COMPONENT_LABEL
- convert to label not pointer to label
RAIDFRAME_CHECK_RECON_STATUS_EXT
RAIDFRAME_CHECK_PARITYREWRITE_STATUS_EXT
RAIDFRAME_CHECK_COPYBACK_STATUS_EXT
- convert to progress info not pointer to info
RAIDFRAME_GET_INFO
- version entirely.
raidframevar.h:
- rf_recon_req{} has row, flags and raidPtr removed (they're
not a useful part of this interface.)
- RF_Config_s{} and RF_DeviceConfig_s{} have numRow/rows removed.
- RF_RaidDisk_s{} is re-ordered slightly to fix alignment
padding - the actual data was already OK.
- InstallSpareTable() loses row argument
rf_compat32.c has code for RF_Config_s{} in 32 bit mode, used
by RAIDFRAME_CONFIGURE and RAIDFRAME_GET_INFO32.
rf_compat80.c has code for rf_recon_req{}, RF_RaidDisk_s{} and
RF_DeviceConfig_s{} to handle RAIDFRAME_FAIL_DISK,
RAIDFRAME_GET_COMPONENT_LABEL, RAIDFRAME_CHECK_RECON_STATUS_EXT,
RAIDFRAME_CHECK_PARITYREWRITE_STATUS_EXT,
RAIDFRAME_CHECK_COPYBACK_STATUS_EXT, RAIDFRAME_GET_INFO.
move several of the per-ioctl code blocks into separate functions.
add rf_recon_req_internal{} to replace old usage of global
rf_recon_req{} that had unused void * in the structure, ruining
it's 32/64 bit ABI.
add missing case for RAIDFRAME_GET_INFO50.
adjust raid tests to use the new .conf format, and add a case to
test the old method as well.
raidctl:
deal with lack of 'row' members in a couple of places.
fail request no longer takes row.
handle "START array" sections with just "numCol numSpare", ie
no "numRow" specified. for now, generate old-style configuration
but update raidctl.8 to specify the new style (keeping reference
to the old style.)
note that: RF_ComponentLabel_s::{row,num_rows} and
RF_SingleComponent_s::row are obsolete but not removed yet.
1. Don't force use of "for" when "while" works better.
2. No need to check c != '\0' when we also check (c == ' ' || c == '\t')
3. Use the size of the buffer we're using, rather than a different one
(not really a concern, they're the same size)
4. Don't use fscanf() to read file data, use fgets() & sscanf().
5. After using a pointer as a char *, validate alignment before switching
to int * (can only fail if kernel #define gets set stupidly) Or #6...
6. Validate sparemap file name isn't too long for assigned space.
7. recognise that strlen() returns size_t - don't shove it into an int.
8. On out of mem, be more clear which allocation failed in warning msg.
ATF tests all pass. But I don't think they use sparemap files.
Coincidentally, the change also works around a gcc 5.1 bug which causes
a segmentation fault when trying to compile the longer version (guess
the compiler got exhausted, or something).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66345
Replace atoi(3) by strtol(3), and check that numbers are valid,
positive, and in int32_t range. The previous lack of check could
silently lead to the same serial being set to all RAID volumes
for instance because given numbers were bigger than INT_MAX. The
consequence is in an awful mess when RAIDframe would mix volumes...