Commit Graph

694 Commits

Author SHA1 Message Date
oster
c38bce14f6 Vastly improve the error handling in the case of a read/write error
that occurs during a reconstruction.  We go from zero error handling
and likely panicing if something goes amiss, to gracefully bailing and
leaving the system in the best, usable state possible.

- introduce rf_DrainReconEventQueue() to allow easy cleaning of the
reconstruction event queue

- change how we cleanup the floating recon buffers in
rf_FreeReconControl().  Detect the end of the list rather
than traversing according to a count.

- keep track of the number of pending reconstruction writes.  In the
event of a read error, use this to wait long enough for the pending
writes to (hopefully) drain.

- more cleanup is still needed on this code, but I didn't want to
start mixing major functional changes with minor cleanups.

XXX: There is a known issue with pool items left outstanding due to
the IO failure, and this can show up in the form of a panic at the
tail end of a shutdown.  This problem is much less severe than before
these changes, and the hope/plan is that this problem will go away
once this code gets overhauled again.
2005-02-05 23:32:43 +00:00
oster
c18a242754 Torch some #define's missed in last commit. 2005-01-22 02:24:31 +00:00
oster
3140947870 Reconstruction Descriptors are only allocated once per reconstruction,
and don't need their own pool or freelist or anything fancier than a
malloc/free.
2005-01-22 02:22:44 +00:00
oster
26187fa579 ForceReconReadDoneProc() needs a return after doing the first
rf_CauseReconEvent().
2005-01-18 03:29:51 +00:00
oster
320bcedf91 After walking through desc->dagList nuking entries, make sure
desc->dagList is set to NULL before continuing.  If we don't,
there's a danger that we'll try to re-free these items later.
(This should fix a panic reported to me via private communciation.)
2005-01-14 01:33:15 +00:00
oster
10928931ab The switch() in rf_ContinueReconstructFailedDisk() is never actually
used in non-simulation code, and thus is just wasting space (and
making the code more confusing to read!).  Turf the switch, left-shift
the indentation of code, and nuke 'state' field of struct RF_RaidReconDesc_s.

No real functional changes.
2004-12-12 20:53:15 +00:00
oster
fc7a4ed42c Only touch bufpool whilst in splbio(). (That should be the case
already, but this makes it explicit and safer in the case where
that changes for some reason.)
2004-11-24 13:42:36 +00:00
oster
1f113cb32a Don't allow -f to fail a disk while a reconstruction is taking place
since that would cause a panic.  (Problem noticed by dan@.)
2004-11-17 01:34:10 +00:00
oster
e8aee550dd Initialize parity_rewrite_stripes_done to remove the window where
bogus values could be returned at the start of parity rewriting.
2004-11-16 16:52:30 +00:00
oster
d7e754c41d On an idea from Thor (tls@), do not fail a component if doing so would
render the RAID set completely dead.  Instead, we retry the IO a
maximum of RF_RETRY_THRESHOLD times (currently '5'), and then just
return an IO error if the IO fails.  This should reduce the damage
caused by having multiple disks appear to fail when the culprit is
really something else (power, controllers, etc.)
2004-11-16 16:45:51 +00:00
oster
5cdd8e2bd5 continueFunc and continueArg arn't used. Turf. Simplify calls to
rf_GetNextReconEvent().
2004-11-15 17:16:28 +00:00
yamt
05f25dcc2a move buffer queue related stuffs from buf.h to their own header, bufq.h. 2004-10-28 07:07:35 +00:00
thorpej
fae7cbfa65 rf_find_raid_components():
- If DIOCGDINFO failed with ENOTTY, don't print an error message; wedges
  don't support that ioctl.  Clean up the error message.
- If DIOCGDINFO fails, don't proceed to examine an invalid disklabel
  structure.
2004-10-15 06:41:35 +00:00
tron
86579fbd2a Make this actually compile. 2004-10-10 11:15:22 +00:00
mrg
6428501b14 when truncating a spare disk, also log what its original size was. 2004-10-10 01:17:40 +00:00
oster
bcb300782d Correct some RF_ASSERTS() that were missed when fixing memory issues
with this code.  Thanks to palle at lyckegaard.dk for pointing them
out.  Addresses PR#26776 (but doesn't use all the suggested fixes).
2004-08-27 15:55:50 +00:00
oster
da1725a116 rf_CheckLabels() needs to die, but for now, we patch it by setting
fatal_error when too_fatal is set, and by setting fatal_error in a
couple other critical cases.
2004-08-26 17:09:18 +00:00
oster
98af6adb1d The result of rf_DoAccess() should *not* be assigned to bp->b_error.
As well, when we do detect some sort of an error, we should be doing a
biodone() here.  Thanks to yamt for noting the missing biodone(), as
that led to discovery of the additional lossage.
2004-07-01 17:48:45 +00:00
oster
4880044636 Remove a (redundant) check that was already performed in raidstart(). 2004-06-29 17:09:01 +00:00
oster
6df1f117b1 Address a number of issues:
1) Introduce functions to allocate and free the emergency IO buffers.

2) Make sure we free any allocated emergency buffers in the event that
we bail out during configuration, or when we unconfigure an array.

3) if we run out of memory trying to allocate a given type of buffer,
don't continue to try to allocate more of those buffers.
(Partially addresses PR#25787)
2004-06-27 03:15:18 +00:00
drochner
e7bdadd856 fix const'ificication, gcc-3.4 will notice it 2004-06-02 22:58:28 +00:00
oster
5f5d81ce38 Add support for the word "absent" in the "disks" section of
RAID config files.  Used as a placeholder for a component that
will eventually be added into the set.
2004-05-22 20:56:52 +00:00
itojun
aca4c091d3 sprintf -> snprintf 2004-04-22 00:17:10 +00:00
oster
4a82b086a3 Allocating emergency buffer space is all fine and well, but one should really
remember to return the memory when unconfiguring the array.  Same thing goes
for the pool elements used to build the list!
2004-04-10 05:52:33 +00:00
oster
85611189b6 These changes complete the effective removal of malloc() from all
write paths within RAIDframe.  They also resolve the "panics with
RAID 5 sets with more than 3 components" issue which was present
(briefly) in the commits which were previously supposed to address
the malloc() issue.

With this new code the 5-component RAID 5 set panics are now gone.

It is also now also possible to swap to RAID 5.

The changes made are:

1) Introduce rf_AllocStripeBuffer() and rf_FreeStripeBuffer() to
allocate/free one stripe's worth of space.  rf_AllocStripeBuffer() is
used in rf_MapUnaccessedPortionOfStripe() where it is not sufficient to
allocate memory using just rf_AllocBuffer().  rf_FreeStripeBuffer() is
called from rf_FreeRaidAccDesc(), well after the DAG is finished.

2) Add a set of emergency "stripe buffers" to struct RF_Raid_s.
Arrange for their initialization in rf_Configure().  In low-memory
situations these buffers will be returned by rf_AllocStripeBuffer()
and re-populated by rf_FreeStripeBuffer().

3) Move	RF_VoidPointerListElem_t *iobufs from the dagHeader into
into struct RF_RaidAccessDesc_s.  This is more consistent with the
original code, and will not result in items being freed "too early".

4) Add a RF_RaidAccessDesc_t *desc to RF_DagHeader_s so that we have a
way to find desc->iobufs.

5) Arrange for desc in the DagHeader to be initialized in InitHdrNode().

6) Don't cleanup iobufs in rf_FreeDAG() -- the freeing is now delayed
until rf_FreeRaidAccDesc() (which is how the original code handled the
allocList, and for which there seem to be some subtle, undocumented
assumptions).

7) Rename rf_AllocBuffer2() to be rf_AllocBuffer() and remove the
former rf_AllocBuffer().  Fix all callers of rf_AllocBuffer().
(This was how it was *supposed* to be after the last time these
changes were made, before they were backed out).

8) Remove RF_IOBufHeader and all references to it.

9) Remove desc->cleanupList and all references to it.

Fixes PR#20191
2004-04-09 23:10:16 +00:00
oster
fcea0f7690 We really should have a wakeup in RF_UNLOCK_PSS_MUTEX in case we have
a nap in RF_LOCK_PSS_MUTEX!
2004-04-09 17:01:03 +00:00
oster
b359c2a356 This assert is outdated, and just plain wrong. 2004-03-23 21:55:23 +00:00
oster
54df291697 Partially back out some changes that were causing grief with
RAID5 sets with more than 3 drives.  Still need to figure out why
the original changes were losing, but need the version in tree reliable
first!

Huge THANKS to Juergen Hannken-Illjes for helping track down
the changes that were causing the lossage.
2004-03-23 21:53:36 +00:00
oster
7dc6ce2f91 Ooops.. this free should come at the end of the loop. Thanks
to Juergen Hannken-Illjes for pointing it out.
2004-03-23 13:09:18 +00:00
oster
bceb7a2778 bufpool must be accessed at splbio(). 2004-03-23 02:34:10 +00:00
oster
7e8ad96008 If the DAG failed, need to make sure we wipe the dagList structures too. 2004-03-22 20:28:57 +00:00
oster
43ccce7d13 Why start a timer, and then just ignore it? *punt* 2004-03-21 21:20:46 +00:00
oster
78d093eaf5 Yesterday's fix to rf_disks.c (rev 1.51) was necessary, but not
sufficient to clobber this nasty little bug.  The behaviour observed
was a panic when doing a 'raidctl -f' on a component when DAGs were
in flight for the given RAID set.  Unfortunatly, the faulty behaviour
was very intermittent, and it was difficult to not only reliably
reproduce the bug (nor determine when it was fixed!) but also to even
figure out what might be the cause of the problem.

The real issue was that ci_vp for the failed component was being
set to NULL in rf_FailDisk(), but with DAGs still in flight, some
of them were still expecting to use ci_vp to determine where to
read to/write from!

The fix is to call rf_SuspendNewRequestsAndWait() from rf_FailDisk()
to make sure the RAID set is quiet and all IOs have completed before
mucking with ci_vp and other data structures.  rf_ResumeNewRequests()
is then used to continue on as usual.
2004-03-21 21:08:08 +00:00
oster
3dd7f5503f Fix a nastly little bug that I've been chasing over the past 12 hours.
If raidPtr->numFailures isn't initialized properly, then all sorts of
whacky things can happen, including incorrect DAGs being generated.
(Triggering this problem is a little esoteric, which is why this bug has
been in hiding for so long -- I only saw it after rebooting with a
degraded RAID 5 set that was autoconfigured, rebuilding the failed
componennt, and then failing the component while IO was happening to
the RAID set.)
2004-03-21 06:32:03 +00:00
oster
492aa07868 Doesn't hurt much to zero this before we start mucking with it. 2004-03-21 06:16:49 +00:00
oster
01e44f9df5 Add in a couple of missed foo=foo->next's. 2004-03-21 03:22:08 +00:00
oster
ac19c32ed5 Can't conditionalize cleanup on numStripeUnitsBailed -- have to
cleanup regardless.

More importantly, we can't free any of the AccessStripeMaps here!
2004-03-20 21:25:55 +00:00
oster
06f16f554f NO_STRIPE_LOCKS is never set, so this code will always execute.
Remove conditionals, and left-shift code.
2004-03-20 17:30:40 +00:00
oster
1966e6afbb Cleanup function prototypes. 2004-03-20 16:48:05 +00:00
oster
a7f8d0aef6 [bah.. specifying rf_dagutils.c twice on a checkin doesn't get you
rf_dagutils.h... missed this one from yesterday.  sorry folks :( ]

Change signature of rf_AllocBuffer() to take a dag_h and buffer size
instead of an PDA and an alloclist.  This lets us do the vple dance
inside of rf_AllocBuffer().

Cleanup usage of rf_AllocIOBuffer() and use rf_AllocBuffer() instead.

Fix all uses of rf_AllocBuffer() to conform to the new way of doing
things.
2004-03-20 15:56:21 +00:00
oster
9aa1b6b7c0 Change signature of rf_AllocBuffer() to take a dag_h and buffer size
instead of an PDA and an alloclist.  This lets us do the vple dance
inside of rf_AllocBuffer().

Cleanup usage of rf_AllocIOBuffer() and use rf_AllocBuffer() instead.

Fix all uses of rf_AllocBuffer() to conform to the new way of doing
things.
2004-03-20 05:21:53 +00:00
oster
0ff2145648 For each RAID set, pre-allocate a number of "emergency buffers" to be
used in the event that we can't malloc a buffer of the appropriate
size in the traditional way.  rf_AllocIOBuffer() and rf_FreeIOBuffer()
deal with allocating/freeing these structures.  These buffers are
stored in a list on the 'iobuf' list.  iobuf_count keeps track of how
many buffers are available, and numEmergencyBuffers is the effective
"high-water" mark for the freelist.  The buffers allocated by
rf_AllocIOBuffer() are stripe-unit sized, which is the maximum
size requested by any of the callers.

Add an iobufs entry to RF_DagHeader_s.  Use it for keeping track of
buffers that get allocated from the free-list.

Add a "generic list" pool (VoidPointerListElement Pool) for elements
used to maintain a list of allocated memory.  [It is somewhat less
than ideal to add another little pool to handle this...]

Teach rf_AllocBuffer() to use the new rf_AllocIOBuffer().  Modify
other Mallocs to use rf_AllocIOBuffer(), and to update dag_h->iobufs as
appropriate.

Update rf_FreeDAG() to handle cleanup of dag_h->iobufs.

While here, add some missing pool_destroy() calls for a number of pools.

With these changes, it should (in theory) be possible to swap on
RAID 5 sets again.  That said, I've not had any success there yet --
but the last issue I saw at least wasn't in RAIDframe. :-}

[There is room for this code to become a bit more consise, but I
wanted to do a checkpoint here with something known to work :) ]
2004-03-20 04:22:05 +00:00
oster
29c6e63ebb dag_node_pool never did get used here. Turf. 2004-03-19 17:04:35 +00:00
oster
1a3e20d5d9 Introduce a dual-purpose pool for providing pointer and param "caches"
for RF_DagNode_t's.  Scale the structure size based on RF_MAXCOL.
Use the new allocation method in InitNode().  Note that we can't get
rid of the mallocs in there until we can prove that this new
allocation method is a strict upper bound.  Unless someone tries
running a RAID set with 40 components, the mallocs here shouldn't
shouldn't be an issue.  (and if someone does make a set with 40 components
they will run into other issues with other constants long before
then)
2004-03-19 17:01:26 +00:00
oster
b2c52e1175 Take care of six more mallocs:
- Pull rf_FreePhysDiskAddr() out from under a #ifdef, since we're now
going to use it.

- Add a pda_cleanup_list into the DAG header.  Use it in rf_FreeDAG() to
cleanup any PDA's that get allocated but have no "easy" way of being
located and freed when the DAG completes.

- numStripeUnitsAccessed is a per-stripe value, and has a maximum
value equal to the number of colums (thus limited by RF_MAXCOL).
Use this knowledge to set a high-bound on overlappingPDAs, and stuff
it on the stack instead of malloc'ing it all the time!  This costs us
a whopping 40 bytes on the stack, but saves a malloc() and a free().
2004-03-19 15:16:18 +00:00
oster
5ac8fbad7f Add a comment. Will hopefully save time next time someone tries
to figure out where the allocated memory is freed.
2004-03-19 02:57:34 +00:00
oster
d3810da59b Add a few comments to explain what some of these new structures are, and
where they are used.
2004-03-19 02:34:30 +00:00
oster
208b461a96 Introduce 3 more pools and 6 functions to handle allocating/freeing
elements from the pools.

Re-work rf_SelectAlgorithm() to get rid of all the 8 malloc's, and to
use the new functions to get/put these 'support structures'.  I'm not
overly happy with some of the variable names, but them's the breaks.

In the process of changing things, fix a bug:
 - in the case where we can't create a dag, free asmh_b and blockFuncs
too!!

[if you were able to look at the source code related to these changes,
and comprehend what was going on without having your eyes bleed or
getting dizzy, please contact me...  I'm sure I'll have more code
which would benefit by you having a look at it before I commit it :) ]
2004-03-19 02:27:44 +00:00
oster
997983060e Re-work rf_State_Quiesce() so that we don't have to hold a lock
while doing a pool_get().
2004-03-19 01:56:03 +00:00
oster
b69e81af97 Remove a debugging line that was accidentally left in. 2004-03-18 17:46:22 +00:00