NetBSD

Commit Graph

Author	SHA1	Message	Date
oster	10928931ab	The switch() in rf_ContinueReconstructFailedDisk() is never actually used in non-simulation code, and thus is just wasting space (and making the code more confusing to read!). Turf the switch, left-shift the indentation of code, and nuke 'state' field of struct RF_RaidReconDesc_s. No real functional changes.	2004-12-12 20:53:15 +00:00
oster	fc7a4ed42c	Only touch bufpool whilst in splbio(). (That should be the case already, but this makes it explicit and safer in the case where that changes for some reason.)	2004-11-24 13:42:36 +00:00
oster	1f113cb32a	Don't allow -f to fail a disk while a reconstruction is taking place since that would cause a panic. (Problem noticed by dan@.)	2004-11-17 01:34:10 +00:00
oster	e8aee550dd	Initialize parity_rewrite_stripes_done to remove the window where bogus values could be returned at the start of parity rewriting.	2004-11-16 16:52:30 +00:00
oster	d7e754c41d	On an idea from Thor (tls@), do not fail a component if doing so would render the RAID set completely dead. Instead, we retry the IO a maximum of RF_RETRY_THRESHOLD times (currently '5'), and then just return an IO error if the IO fails. This should reduce the damage caused by having multiple disks appear to fail when the culprit is really something else (power, controllers, etc.)	2004-11-16 16:45:51 +00:00
oster	5cdd8e2bd5	continueFunc and continueArg arn't used. Turf. Simplify calls to rf_GetNextReconEvent().	2004-11-15 17:16:28 +00:00
yamt	05f25dcc2a	move buffer queue related stuffs from buf.h to their own header, bufq.h.	2004-10-28 07:07:35 +00:00
thorpej	fae7cbfa65	rf_find_raid_components(): - If DIOCGDINFO failed with ENOTTY, don't print an error message; wedges don't support that ioctl. Clean up the error message. - If DIOCGDINFO fails, don't proceed to examine an invalid disklabel structure.	2004-10-15 06:41:35 +00:00
tron	86579fbd2a	Make this actually compile.	2004-10-10 11:15:22 +00:00
mrg	6428501b14	when truncating a spare disk, also log what its original size was.	2004-10-10 01:17:40 +00:00
oster	bcb300782d	Correct some RF_ASSERTS() that were missed when fixing memory issues with this code. Thanks to palle at lyckegaard.dk for pointing them out. Addresses PR#26776 (but doesn't use all the suggested fixes).	2004-08-27 15:55:50 +00:00
oster	da1725a116	rf_CheckLabels() needs to die, but for now, we patch it by setting fatal_error when too_fatal is set, and by setting fatal_error in a couple other critical cases.	2004-08-26 17:09:18 +00:00
oster	98af6adb1d	The result of rf_DoAccess() should not be assigned to bp->b_error. As well, when we do detect some sort of an error, we should be doing a biodone() here. Thanks to yamt for noting the missing biodone(), as that led to discovery of the additional lossage.	2004-07-01 17:48:45 +00:00
oster	4880044636	Remove a (redundant) check that was already performed in raidstart().	2004-06-29 17:09:01 +00:00
oster	6df1f117b1	Address a number of issues: 1) Introduce functions to allocate and free the emergency IO buffers. 2) Make sure we free any allocated emergency buffers in the event that we bail out during configuration, or when we unconfigure an array. 3) if we run out of memory trying to allocate a given type of buffer, don't continue to try to allocate more of those buffers. (Partially addresses PR#25787)	2004-06-27 03:15:18 +00:00
drochner	e7bdadd856	fix const'ificication, gcc-3.4 will notice it	2004-06-02 22:58:28 +00:00
oster	5f5d81ce38	Add support for the word "absent" in the "disks" section of RAID config files. Used as a placeholder for a component that will eventually be added into the set.	2004-05-22 20:56:52 +00:00
itojun	aca4c091d3	sprintf -> snprintf	2004-04-22 00:17:10 +00:00
oster	4a82b086a3	Allocating emergency buffer space is all fine and well, but one should really remember to return the memory when unconfiguring the array. Same thing goes for the pool elements used to build the list!	2004-04-10 05:52:33 +00:00
oster	85611189b6	These changes complete the effective removal of malloc() from all write paths within RAIDframe. They also resolve the "panics with RAID 5 sets with more than 3 components" issue which was present (briefly) in the commits which were previously supposed to address the malloc() issue. With this new code the 5-component RAID 5 set panics are now gone. It is also now also possible to swap to RAID 5. The changes made are: 1) Introduce rf_AllocStripeBuffer() and rf_FreeStripeBuffer() to allocate/free one stripe's worth of space. rf_AllocStripeBuffer() is used in rf_MapUnaccessedPortionOfStripe() where it is not sufficient to allocate memory using just rf_AllocBuffer(). rf_FreeStripeBuffer() is called from rf_FreeRaidAccDesc(), well after the DAG is finished. 2) Add a set of emergency "stripe buffers" to struct RF_Raid_s. Arrange for their initialization in rf_Configure(). In low-memory situations these buffers will be returned by rf_AllocStripeBuffer() and re-populated by rf_FreeStripeBuffer(). 3) Move RF_VoidPointerListElem_t iobufs from the dagHeader into into struct RF_RaidAccessDesc_s. This is more consistent with the original code, and will not result in items being freed "too early". 4) Add a RF_RaidAccessDesc_t desc to RF_DagHeader_s so that we have a way to find desc->iobufs. 5) Arrange for desc in the DagHeader to be initialized in InitHdrNode(). 6) Don't cleanup iobufs in rf_FreeDAG() -- the freeing is now delayed until rf_FreeRaidAccDesc() (which is how the original code handled the allocList, and for which there seem to be some subtle, undocumented assumptions). 7) Rename rf_AllocBuffer2() to be rf_AllocBuffer() and remove the former rf_AllocBuffer(). Fix all callers of rf_AllocBuffer(). (This was how it was supposed to be after the last time these changes were made, before they were backed out). 8) Remove RF_IOBufHeader and all references to it. 9) Remove desc->cleanupList and all references to it. Fixes PR#20191	2004-04-09 23:10:16 +00:00
oster	fcea0f7690	We really should have a wakeup in RF_UNLOCK_PSS_MUTEX in case we have a nap in RF_LOCK_PSS_MUTEX!	2004-04-09 17:01:03 +00:00
oster	b359c2a356	This assert is outdated, and just plain wrong.	2004-03-23 21:55:23 +00:00
oster	54df291697	Partially back out some changes that were causing grief with RAID5 sets with more than 3 drives. Still need to figure out why the original changes were losing, but need the version in tree reliable first! Huge THANKS to Juergen Hannken-Illjes for helping track down the changes that were causing the lossage.	2004-03-23 21:53:36 +00:00
oster	7dc6ce2f91	Ooops.. this free should come at the end of the loop. Thanks to Juergen Hannken-Illjes for pointing it out.	2004-03-23 13:09:18 +00:00
oster	bceb7a2778	bufpool must be accessed at splbio().	2004-03-23 02:34:10 +00:00
oster	7e8ad96008	If the DAG failed, need to make sure we wipe the dagList structures too.	2004-03-22 20:28:57 +00:00
oster	43ccce7d13	Why start a timer, and then just ignore it? punt	2004-03-21 21:20:46 +00:00
oster	78d093eaf5	Yesterday's fix to rf_disks.c (rev 1.51) was necessary, but not sufficient to clobber this nasty little bug. The behaviour observed was a panic when doing a 'raidctl -f' on a component when DAGs were in flight for the given RAID set. Unfortunatly, the faulty behaviour was very intermittent, and it was difficult to not only reliably reproduce the bug (nor determine when it was fixed!) but also to even figure out what might be the cause of the problem. The real issue was that ci_vp for the failed component was being set to NULL in rf_FailDisk(), but with DAGs still in flight, some of them were still expecting to use ci_vp to determine where to read to/write from! The fix is to call rf_SuspendNewRequestsAndWait() from rf_FailDisk() to make sure the RAID set is quiet and all IOs have completed before mucking with ci_vp and other data structures. rf_ResumeNewRequests() is then used to continue on as usual.	2004-03-21 21:08:08 +00:00
oster	3dd7f5503f	Fix a nastly little bug that I've been chasing over the past 12 hours. If raidPtr->numFailures isn't initialized properly, then all sorts of whacky things can happen, including incorrect DAGs being generated. (Triggering this problem is a little esoteric, which is why this bug has been in hiding for so long -- I only saw it after rebooting with a degraded RAID 5 set that was autoconfigured, rebuilding the failed componennt, and then failing the component while IO was happening to the RAID set.)	2004-03-21 06:32:03 +00:00
oster	492aa07868	Doesn't hurt much to zero this before we start mucking with it.	2004-03-21 06:16:49 +00:00
oster	01e44f9df5	Add in a couple of missed foo=foo->next's.	2004-03-21 03:22:08 +00:00
oster	ac19c32ed5	Can't conditionalize cleanup on numStripeUnitsBailed -- have to cleanup regardless. More importantly, we can't free any of the AccessStripeMaps here!	2004-03-20 21:25:55 +00:00
oster	06f16f554f	NO_STRIPE_LOCKS is never set, so this code will always execute. Remove conditionals, and left-shift code.	2004-03-20 17:30:40 +00:00
oster	1966e6afbb	Cleanup function prototypes.	2004-03-20 16:48:05 +00:00
oster	a7f8d0aef6	[bah.. specifying rf_dagutils.c twice on a checkin doesn't get you rf_dagutils.h... missed this one from yesterday. sorry folks :( ] Change signature of rf_AllocBuffer() to take a dag_h and buffer size instead of an PDA and an alloclist. This lets us do the vple dance inside of rf_AllocBuffer(). Cleanup usage of rf_AllocIOBuffer() and use rf_AllocBuffer() instead. Fix all uses of rf_AllocBuffer() to conform to the new way of doing things.	2004-03-20 15:56:21 +00:00
oster	9aa1b6b7c0	Change signature of rf_AllocBuffer() to take a dag_h and buffer size instead of an PDA and an alloclist. This lets us do the vple dance inside of rf_AllocBuffer(). Cleanup usage of rf_AllocIOBuffer() and use rf_AllocBuffer() instead. Fix all uses of rf_AllocBuffer() to conform to the new way of doing things.	2004-03-20 05:21:53 +00:00
oster	0ff2145648	For each RAID set, pre-allocate a number of "emergency buffers" to be used in the event that we can't malloc a buffer of the appropriate size in the traditional way. rf_AllocIOBuffer() and rf_FreeIOBuffer() deal with allocating/freeing these structures. These buffers are stored in a list on the 'iobuf' list. iobuf_count keeps track of how many buffers are available, and numEmergencyBuffers is the effective "high-water" mark for the freelist. The buffers allocated by rf_AllocIOBuffer() are stripe-unit sized, which is the maximum size requested by any of the callers. Add an iobufs entry to RF_DagHeader_s. Use it for keeping track of buffers that get allocated from the free-list. Add a "generic list" pool (VoidPointerListElement Pool) for elements used to maintain a list of allocated memory. [It is somewhat less than ideal to add another little pool to handle this...] Teach rf_AllocBuffer() to use the new rf_AllocIOBuffer(). Modify other Mallocs to use rf_AllocIOBuffer(), and to update dag_h->iobufs as appropriate. Update rf_FreeDAG() to handle cleanup of dag_h->iobufs. While here, add some missing pool_destroy() calls for a number of pools. With these changes, it should (in theory) be possible to swap on RAID 5 sets again. That said, I've not had any success there yet -- but the last issue I saw at least wasn't in RAIDframe. :-} [There is room for this code to become a bit more consise, but I wanted to do a checkpoint here with something known to work :) ]	2004-03-20 04:22:05 +00:00
oster	29c6e63ebb	dag_node_pool never did get used here. Turf.	2004-03-19 17:04:35 +00:00
oster	1a3e20d5d9	Introduce a dual-purpose pool for providing pointer and param "caches" for RF_DagNode_t's. Scale the structure size based on RF_MAXCOL. Use the new allocation method in InitNode(). Note that we can't get rid of the mallocs in there until we can prove that this new allocation method is a strict upper bound. Unless someone tries running a RAID set with 40 components, the mallocs here shouldn't shouldn't be an issue. (and if someone does make a set with 40 components they will run into other issues with other constants long before then)	2004-03-19 17:01:26 +00:00
oster	b2c52e1175	Take care of six more mallocs: - Pull rf_FreePhysDiskAddr() out from under a #ifdef, since we're now going to use it. - Add a pda_cleanup_list into the DAG header. Use it in rf_FreeDAG() to cleanup any PDA's that get allocated but have no "easy" way of being located and freed when the DAG completes. - numStripeUnitsAccessed is a per-stripe value, and has a maximum value equal to the number of colums (thus limited by RF_MAXCOL). Use this knowledge to set a high-bound on overlappingPDAs, and stuff it on the stack instead of malloc'ing it all the time! This costs us a whopping 40 bytes on the stack, but saves a malloc() and a free().	2004-03-19 15:16:18 +00:00
oster	5ac8fbad7f	Add a comment. Will hopefully save time next time someone tries to figure out where the allocated memory is freed.	2004-03-19 02:57:34 +00:00
oster	d3810da59b	Add a few comments to explain what some of these new structures are, and where they are used.	2004-03-19 02:34:30 +00:00
oster	208b461a96	Introduce 3 more pools and 6 functions to handle allocating/freeing elements from the pools. Re-work rf_SelectAlgorithm() to get rid of all the 8 malloc's, and to use the new functions to get/put these 'support structures'. I'm not overly happy with some of the variable names, but them's the breaks. In the process of changing things, fix a bug: - in the case where we can't create a dag, free asmh_b and blockFuncs too!! [if you were able to look at the source code related to these changes, and comprehend what was going on without having your eyes bleed or getting dizzy, please contact me... I'm sure I'll have more code which would benefit by you having a look at it before I commit it :) ]	2004-03-19 02:27:44 +00:00
oster	997983060e	Re-work rf_State_Quiesce() so that we don't have to hold a lock while doing a pool_get().	2004-03-19 01:56:03 +00:00
oster	b69e81af97	Remove a debugging line that was accidentally left in.	2004-03-18 17:46:22 +00:00
oster	ba5bdf0048	Use rf_AllocDAGNode() to get new DAG nodes.	2004-03-18 17:26:36 +00:00
oster	1051cc745f	Re-work the locking mechanisms for reconstruct and PSS structures such that we don't actually hold a simplelock while we are doing a pool_get(), but that we still effectively protecting critical code. This should fix all of the outstanding LOCKDEBUG warnings related to rebuilding RAID sets.	2004-03-18 16:54:54 +00:00
oster	d4fe1a2103	- Introduce a 'dagnode' pool. Initialize it and allow for cleanup. Provide rf_AllocDAGNode() and rf_FreeDAGNode() to handle allocation/freeing. - Introduce a "nodes" linked list of RF_DagNode_t's into the DAG header. Initialize nodes in InitHdrNode(). Arrange for nodes cleanup in rf_FreeDAG(). - Add a "list_next" to RF_DagNode_t to keep track of nodes on the above "nodes" list. (This is distinct from the "next" field of RF_DagNode_t, which keeps track of the firing order of nodes.) "list_next" gets used in the cleanup routines, and in traversing through a set of nodes that belong to a particular set of nodes (e.g. those belonging to xorNodes for a given DAG). - use rf_AllocDAGNode() instead of mallocs of variable-sized arrays of RF_DagNode_t's. Mostly mechanical changes to convert the DAG construction from "access nodes via an array index" to "access nodes via a 'nextnode' pointer". - rework a couple of tricky spots where assumptions about the node order was being abused. - performance remains consistent with performance before these changes. [Thanks to Simon Burge (simonb at you.know.where) for looking over the mechanical changes to make sure I didn't biff anything.]	2004-03-18 16:40:05 +00:00
oster	5f5c148f74	raidPtr->num_spare is NOT sufficient here. We must allocate at least an additional RF_MAXSPARE spare units, just in case.	2004-03-13 03:32:08 +00:00
oster	8e82e43e0e	This desc->mutex is only ever initialized -- never used. toss	2004-03-13 02:31:12 +00:00

1 2 3 4 5 ...

689 Commits