RAID5 sets with more than 3 drives. Still need to figure out why
the original changes were losing, but need the version in tree reliable
first!
Huge THANKS to Juergen Hannken-Illjes for helping track down
the changes that were causing the lossage.
ExtremeRAID with NV cache) and "dumb" (3ware 6410) ld providers.
Instead, use the default buffer queue policy.
With the 3ware adapter, using the read priority strategy instead of FCFS,
for three extractions of pkgsrc, took 329 seconds instead of 331 -- but
with a dramatic improvement in perceived system response (latency for
I/O outside the main stream).
With the Mylex adapter, the improvement was dramatic: using read priority
instead of FCFS yielded an improvement from 381 seconds to 135 seconds!
There was a less-noticeable improvement in perceived latency as well.
The other disk drivers currently hard-wired to FCFS or another policy
should probably be changed as well.
sufficient to clobber this nasty little bug. The behaviour observed
was a panic when doing a 'raidctl -f' on a component when DAGs were
in flight for the given RAID set. Unfortunatly, the faulty behaviour
was very intermittent, and it was difficult to not only reliably
reproduce the bug (nor determine when it was fixed!) but also to even
figure out what might be the cause of the problem.
The real issue was that ci_vp for the failed component was being
set to NULL in rf_FailDisk(), but with DAGs still in flight, some
of them were still expecting to use ci_vp to determine where to
read to/write from!
The fix is to call rf_SuspendNewRequestsAndWait() from rf_FailDisk()
to make sure the RAID set is quiet and all IOs have completed before
mucking with ci_vp and other data structures. rf_ResumeNewRequests()
is then used to continue on as usual.
If raidPtr->numFailures isn't initialized properly, then all sorts of
whacky things can happen, including incorrect DAGs being generated.
(Triggering this problem is a little esoteric, which is why this bug has
been in hiding for so long -- I only saw it after rebooting with a
degraded RAID 5 set that was autoconfigured, rebuilding the failed
componennt, and then failing the component while IO was happening to
the RAID set.)
rf_dagutils.h... missed this one from yesterday. sorry folks :( ]
Change signature of rf_AllocBuffer() to take a dag_h and buffer size
instead of an PDA and an alloclist. This lets us do the vple dance
inside of rf_AllocBuffer().
Cleanup usage of rf_AllocIOBuffer() and use rf_AllocBuffer() instead.
Fix all uses of rf_AllocBuffer() to conform to the new way of doing
things.
instead of an PDA and an alloclist. This lets us do the vple dance
inside of rf_AllocBuffer().
Cleanup usage of rf_AllocIOBuffer() and use rf_AllocBuffer() instead.
Fix all uses of rf_AllocBuffer() to conform to the new way of doing
things.
used in the event that we can't malloc a buffer of the appropriate
size in the traditional way. rf_AllocIOBuffer() and rf_FreeIOBuffer()
deal with allocating/freeing these structures. These buffers are
stored in a list on the 'iobuf' list. iobuf_count keeps track of how
many buffers are available, and numEmergencyBuffers is the effective
"high-water" mark for the freelist. The buffers allocated by
rf_AllocIOBuffer() are stripe-unit sized, which is the maximum
size requested by any of the callers.
Add an iobufs entry to RF_DagHeader_s. Use it for keeping track of
buffers that get allocated from the free-list.
Add a "generic list" pool (VoidPointerListElement Pool) for elements
used to maintain a list of allocated memory. [It is somewhat less
than ideal to add another little pool to handle this...]
Teach rf_AllocBuffer() to use the new rf_AllocIOBuffer(). Modify
other Mallocs to use rf_AllocIOBuffer(), and to update dag_h->iobufs as
appropriate.
Update rf_FreeDAG() to handle cleanup of dag_h->iobufs.
While here, add some missing pool_destroy() calls for a number of pools.
With these changes, it should (in theory) be possible to swap on
RAID 5 sets again. That said, I've not had any success there yet --
but the last issue I saw at least wasn't in RAIDframe. :-}
[There is room for this code to become a bit more consise, but I
wanted to do a checkpoint here with something known to work :) ]
Rx interrupts, functions to post a request for new table entries, and
code to apply pending Rx-interrupt control values at the next hardware
interrupt.
As used in a third-party proprietary tree since at least March 2003.
As discussed on tech-kern/tech-net in January 2004 (in the context of
NetBSD for packet capture, bpf, and FreeBSD-sylte IFF_POLL), and as
posted to tech-net for comments in mid-March 2004.
Still missing sysctl or other knobs to acutally change the config-time
values, due to my ignorance of any accepted per-device sysctl namespace.
(e.g., polling for a half-second or more at splnet(), blocking most
interrupts, durin an ifconfig down/ifconfig up).
Appears to help for a 5704C rev A3, which is the only chip I've
ever seen that had even a mild version of the reported problem.
for RF_DagNode_t's. Scale the structure size based on RF_MAXCOL.
Use the new allocation method in InitNode(). Note that we can't get
rid of the mallocs in there until we can prove that this new
allocation method is a strict upper bound. Unless someone tries
running a RAID set with 40 components, the mallocs here shouldn't
shouldn't be an issue. (and if someone does make a set with 40 components
they will run into other issues with other constants long before
then)
- Pull rf_FreePhysDiskAddr() out from under a #ifdef, since we're now
going to use it.
- Add a pda_cleanup_list into the DAG header. Use it in rf_FreeDAG() to
cleanup any PDA's that get allocated but have no "easy" way of being
located and freed when the DAG completes.
- numStripeUnitsAccessed is a per-stripe value, and has a maximum
value equal to the number of colums (thus limited by RF_MAXCOL).
Use this knowledge to set a high-bound on overlappingPDAs, and stuff
it on the stack instead of malloc'ing it all the time! This costs us
a whopping 40 bytes on the stack, but saves a malloc() and a free().
elements from the pools.
Re-work rf_SelectAlgorithm() to get rid of all the 8 malloc's, and to
use the new functions to get/put these 'support structures'. I'm not
overly happy with some of the variable names, but them's the breaks.
In the process of changing things, fix a bug:
- in the case where we can't create a dag, free asmh_b and blockFuncs
too!!
[if you were able to look at the source code related to these changes,
and comprehend what was going on without having your eyes bleed or
getting dizzy, please contact me... I'm sure I'll have more code
which would benefit by you having a look at it before I commit it :) ]
As we turn the chip to big-endian mode on big-endian systems, we should
never byte-swap the data read/written from/to registers. Tested on sparc64.
Finally fix kern/13341 by Jason R. Thorpe (really, the hard work of putting
bus_dmamap_sync() calls at the right places has been done my Jason mid-2001 :)
Trimm the priority, as the upper layers won't do it and will drop the packet
if priority is not 0.
While there, print the revision in the "unsupported chip revision" printf.