by RAIDframe. Convert all other RAIDframe global pools to use pools
defined within this new structure.
- Introduce rf_pool_init(), used for initializing a single pool in
RAIDframe. Teach each of the configuration routines to use
rf_pool_init().
- Cleanup a few pool-related comments.
- Cleanup revent initialization and #defines.
- Add a missing pool_destroy() for the reconbuffer pool.
(Saves another 1K off of an i386 GENERIC kernel, and makes
stuff a lot more readable)
by RAIDframe. Convert all other RAIDframe global pools to use pools
defined within this new structure.
- Introduce rf_pool_init(), used for initializing a single pool in
RAIDframe. Teach each of the configuration routines to use
rf_pool_init().
- Cleanup a few pool-related comments.
- Cleanup revent initialization and #defines.
- Add a missing pool_destroy() for the reconbuffer pool.
(Saves another 1K off of an i386 GENERIC kernel, and makes
stuff a lot more readable)
rf_AllocBuffer() is available, so use it to get buffer space instead
of the previous RF_Malloc() bits. Saves a few bytes, but more
importantly makes the code much more readable.
- introduce RF_MIN_*'s, as necessary. These will indicate the
low-water mark for pools as well as the pool_prime() value.
- add pool_setlowat() for the critical pools.
- pool_prime() and pool_setlowat() the raidframe_cbufpool.
- re-order some pool_prime()'s and pool_sethiwat()'s for clarity.
This removes 3 more RF_PANIC()'s (but we'll currently still panic if any of these cases occur).
fix up a few printf's.
XXX: still needs more cleanup and testing (and be taught to not panic).
- remove callbackArg2 from RF_CallbackDesc_s -- it is only ever set,
never read.
- now that this is done, all callbacks should only take a single argument,
and we can simplify things further.
- change function signature of rf_LookupRUStatus(). The last argument
is now a pointer to a new PSS, in case one is needed. Rather than
having rf_LookupRUStatus() allocate a new PSS, we pre-allocate one
beforehand, where necessary, just in case.
- change callers of rf_lookupRUStatus() to deal with the new way of
calling rf_lookupRUStatus().
[no improvement or worsening of parity rebuild/initialization performance.]
[For the record: The mcpair mutex is being used to protect mcpair->flag.
mcpair gets allocated before each call to rf_DispatchDAG(), so there is no
other process/thread that could be mucking with it. It is only used to
detect the completion of a given parity unit, and rf_DispatchDAG()
only uses it to setup the callback argument for rf_MCPairWakeupFunc()
which will be called when the IO completes. The code after the call
to rf_DispatchDAG() sits and waits for a 'wakeup' on mcpair->cond
(rf_MCPairWakeupFunc() does that). If mcpair->flag is 0 when
rf_DispatchDAG() completes, then rf_MCPairWakeupFunc() hasn't been
called yet (the IO hasn't completed). If it is 1, then the IO is
already done, and we continue on our merry way without sleeping.
Thus, we don't need to hold any lock on mcpair while calling
rf_DispatchDAG().]
memory. Since we only now ever "return(0)", just return (void)
instead.
Cleanup all uses of rf_ShutdownCreate() to not worry about
it ever failing. Shaves another 600 bytes off of an i386 GENERIC kernel.
dynamically allocated variable-sized array (dagArray). Convert code
to use the new linked list stuff instead of the array stuff (the ratio
of one dagList per stripe still applies). The big advantage is in
being able to more efficiently allocate the dagLists on-the-fly, and
not have to know the size(s) of the array beforehand.
VOP_STRATEGY(bp) is replaced by one of two new functions:
- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.
DEV_STRATEGY(bp) is used only for block-to-block device situations.
give it back if we don't need it. If we don't allocate it before
we take our lock, LOCKDEBUG (rightfully) complains that we're trying
to grab something from the pool with PR_WAITOK. This code (and the
PR_WAITOK in particular) really needs to be revisited at some point.
skrueger-at-europe-dot-com. (It turns out that the mutex used to
serve two different purposes, not just one, and for its current use,
it's actually miss-named. Will fix that some other time.)
Collapse the related variables down to zero. That means 'flags' is 0
as well. Nuke the extraction macros, a bunch of the variables, and replace
'flags' as well.
The compiler already knew that these chunks of code
could never be reached (since lu_flag was always 0), so it
already ignored them.
No functional changes.
rf_enableAtomicRMW changes.]
Cleanup rf_enableAtomicRMW and its use. According to the comments, we
can't set this to anything other than zero anyway. Shaves off another
900 bytes. lu_flag's days are numbered now, as are the middle
parameters of RF_CREATE_PARAM3.
can't set this to anything other than zero anyway. Shaves off another
900 bytes. lu_flag's days are numbered now, as are the middle
parameters of RF_CREATE_PARAM3.
debugging printf, and in rf_netbsdkintf.c. We can do the calculations
inside of RF_DEBUG_RECON for the one debugging printf, and only
perform the percentCompleted calculation "on demand" in the
rf_netbsdkintf.c case. Shaves a few more bytes off an i386 GENERIC
kernel, and ever-so-slightly decreases the amount of work performed
during a reconstruct.
rf_DecrAccessesCountState wasn't in the correct spot in
RF_AccessState_e. Following up on that has resulted in one other
correction. Changing orderings of these states is tricky, and
shouldn't be attempted without some thorough analysis. For the
changes committed, the following analysis is offerred:
1) RAIDframe uses a little state machine to take care of building,
executing, and processing the DAGs used to direct IO.
2) The rf_DecrAccessesCountState state is handled by the function
rf_State_DecrAccessCount(). The purpose of this state is to
decrement the number of "accesses-in-flight".
3) rf_Cleanup_State is handled by rf_State_Cleanup(). Its job is to
do general cleanup of DAG arrays and any stripe locks.
4) DefaultStates[] in rf_layout.c indicates that the right spot
for rf_DecrAccessesCountState is just before rf_Cleanup_State.
Analysis of code for both states indicates that the order doesn't
matter too much, although rf_State_DecrAccessCount() should probably
take place *after* rf_State_Cleanup() to be more correct.
5) Comments in rf_State_ProcessDAG() indicates that the next state
should be rf_Cleanup_State. However: it attempts to get there by using
desc->state++;
which actually takes it to just rf_DecrAccessesCountState! This turned
out to be OK before, since rf_Cleanup_State would follow right after,
and all would be taken careof (albeit in arguably the "less correct"
order).
6) With the current ordering, if we head directly to rf_Cleanup_State
(as we do, for example, if multiple components fail in a RAID 5 set),
then we'll actually miss going trough rf_DecrAccessesCountState), and
could end up never being able to reach quiescence! Perhaps not too
big of a deal, given that the RAID set is pretty much toast by that
point at which such a drastic state change happens, but might as well
have this correct.
The changes made are:
1) Since having rf_State_DecrAccessCount() come after
rf_State_Cleanup() is just fine, change rf_layout.c to reflect that
rf_DecrAccessesCountState comes after rf_Cleanup_State (i.e. they swap
positions in the state list). This means that going to
rf_Cleanup_State after bailing on a failed DAG access will do all the
right things -- the state will get cleaned up, and then the access
counts will get decremented properly. The comment in
rf_State_ProcessDAG() is now actually correct -- the next state *will*
be rf_Cleanup_State.
2) Move rf_DecrAccessesCountState in RF_AccessState_e to just after
rf_CleanupState. This puts RF_AccessState_e in sync with
DefaultStates[]. Fortunately, these states are rarely referred to by
name, and so this change ends up being mostly cosmetic -- it really
only fixes cleanup behaviour for the recent "Failed to create a DAG"
changes.
~forever. This requires a number of things:
1) If we can't create a DAG, set desc->numStripes to 0 in
rf_SelectAlgorithm. This will ensure that we don't attempt to free
any dagArray[] elements in rf_StateCleanup.
2) Modify rf_State_CreateDAG() to not panic in the event of a DAG
failure. Instead, set the bp->b_flags and bp->b_error, and set things
up to skip to rf_State_Cleanup().
3) Need to mark desc->status as "bad" so that we actually stop looking
for a different DAG. (which we won't find... no matter how many times
we try).
4) rf_State_LastState() will then do the biodone(), and return EIO for
the IO in question.
5) Remove some " || 1 "'s from ProcessNode(). These were for
debugging, and we don't need the failure notices spewing
over and over again as the failing DAGs are processed.
6) Needed to change
if (asmap->numDataFailed + asmap->numParityFailed > 1)
to
if ((asmap->numDataFailed + asmap->numParityFailed > 1) ||
(raidPtr->numFailures > 1)){
in rf_raid5.c so that it doesn't try to return
rf_CreateNonRedundantWriteDAG as the creation function.
7) Note that we can't apply the above change to the RAID 1 code as
with the silly "fake 2-D" RAID 1 sets, it is possible to have 2 failed
components in the RAID 1 set, and that would stop them from working.
(I really don't know why/how those "fake 2-D" RAID 1 sets even work
with all the "single-fault" assumptions present in the rest of the
code.)
8) Needed to protect rf_RAID0DagSelect() in a similar way -- it should
return NULL as the createFunc.
9) No point printing out "Multiple disks failed..." a zillion times.
RF_DAG_RETURN_DAG
RF_DAG_RETURN_ASM
RF_DAG_TEST_ACCESS
and the code that goes with them. A couple more of these
can probably go too, but I might need them in a bit.
bp->b_proc for mapping userspace buffers to kernelspace in the
original rf_kintf.c. That means bp isn't of any use in RF_BZERO()
for us, and the macro can be replaced with just the memset().
No functional changes.