119 Commits

Author SHA1 Message Date
yamt
5c5fb0b1ea fix parens in a message 2013-03-06 11:38:15 +00:00
oster
cebdda608a Add logic to the main reconstruction loop to handle RAID5 with rotated
spares.  While here, observe that we were actually doing one more
stripe than we thought we were, and correct that too (it didn't matter
for non-RAID5_RS, but it definitely does for RAID5_RS).  Add some
bounds-checking at the beginning to handle the case where the number
of stripes in the set is smaller than the sliding reconstruction window.

XXX: this problem likely needs to be fixed for PARITY_DECLUSTERING too.
2012-02-20 22:42:52 +00:00
hannken
2cc7a01f10 Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock.  Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.
2011-10-14 09:23:28 +00:00
oster
28c3372a95 Address part of PR kern/44972. From YAMAMOTO Takashi. Thanks! 2011-08-03 15:00:29 +00:00
yamt
33d93c8dcc rf_ReconstructInPlace: don't leave a vnode open on errors.
fixes a part of PR/44972.
2011-05-28 00:53:04 +00:00
buhrow
463102d28a Suggested to oster@ and approved via private e-mail as a help to
people who are getting reconstruction failures.
2011-05-24 07:33:41 +00:00
mrg
8c36bb4b69 convert the main raidPtr mutex to a kmutex, and add a couple of cv's to
cover the old sleep/wakeup points for adding_hot_spare and waitForReconCond.
convert all remaining simple_lock's to kmutexes (they're not used or compiled
right now... even with all options enabled) and remove the support for them.

this leaves just a pair of tsleep()/wakeup() calls using old scheduling APIs.
2011-05-11 18:13:12 +00:00
mrg
02d186e1e2 convert rb_mutex to a kmutex/cv. 2011-05-02 07:29:18 +00:00
enami
ec02ea412c Define accessors for number of blocks and partition size in the
component label and use them where appropriate.  Disscussed on tech-kern.
2011-02-19 07:11:09 +00:00
dholland
8f6ed30d57 Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
2010-11-19 06:44:33 +00:00
mrg
4de66268a7 add support for >2TB raid devices.
- add two new members to the component label:
     u_int numBlocksHi
     u_int partitionSizeHi
  and store the top 32 bits of the real number of blocks and
  partition size.  modify rf_print_component_label(),
  rf_does_it_fit(), rf_AutoConfigureDisks() and
  rf_ReconstructFailedDiskBasic().

- call disk_blocksize() after disk_attach() [ from mlelstv ]

- shift the block number relative to DEV_BSHIFT in raidstart()
  and InitBP() so that accesses work for non 512-byte devices.
  [ from mlelstv ]

- update rf_getdisksize() to use the new getdisksize() [ from
  mlelstv.  this part needs a separate change for netbsd-5. ]


reviewed by: oster, christos and darrenr
2010-11-01 02:35:24 +00:00
jld
f1a1ad338d Finally commit the RAIDframe parity map Summer Of Code project.
Drastically reduces the amount of time spent rewriting parity after an
unclean shutdown by keeping better track of which regions might have had
outstanding writes.  Enabled by default; can be disabled on a per-set
basis, or tuned, with the new raidctl(8) commands.

Discussed on tech-kern@ to a general air of approval; exhortations to
commit from mrg@, christos@, and others.

Thanks to Google for their sponsorship, oster@ for mentoring the
project, assorted developers for trying very hard to break it, and
probably more I'm forgetting.
2009-11-17 18:54:26 +00:00
oster
f17e8d67c4 If we see a RF_RECON_WRITE_ERROR event we know a write has finished and
we need to account for that.  Failure to do so means we can end up
waiting forever for writes we think are outstanding, but which have
already completed.

Addresses the RAIDframe part of PR#40569.  Thanks to Matthias Scheler
for reporting the issue and verifying the fix.
2009-02-11 23:54:10 +00:00
oster
73225b15a5 When unconfiguring an array where a reconstruct is in progress, abort
the reconstruct and wait for IOs to drain before pulling the plug.

Should fix the panic reported by der Mouse on tech-kern.
2008-12-20 17:04:51 +00:00
oster
c4025116b9 Nuke unneeded printf(). Spotted by pooka@. 2008-09-23 21:36:35 +00:00
oster
396f9f4598 Re-work some of the guts of the reconstruction code.
Reconmap used to have one pointer for every reconstruction unit.  This
does not scale well in the land of 1TB disks, where some 100MB+ of
"status pointers" are required for typical configurations.  Convert
the reconstruction code to use a "sliding status window" which will
scale nicely regardless of the number of stripes/reconstruction units
in the RAID set.  Convert the main reconstruction loop to rebuild the
array in chunks rather than in one big lump.

As part of these changes, introduce a function to kick any waiters on
the head separation callback list, and use that in the main
reconstruction event queue to wake up the waiters if things have
stalled.  (I believe this may fix a race condition that could occur at
at least at the very end of a disk during reconstruction under heavy
IO load.)

Thanks to Brian Buhrow for all his help, support, and patience in
testing these changes.
2008-05-19 19:49:54 +00:00
oster
8fb49f6fa8 A forced recon read should not default to indicating that the reads
for that disk have stopped, since this will bump us out of the normal
reconstruction loop prematurely.

Fixes the (mostly cosmetic) bug where the reconstruction
status values stop updating, and from raidctl it appears that
reconstruction has totally stalled (which it actually hasn't -- the
reconstruction does complete properly, but not in the normal way).
2008-04-15 16:05:43 +00:00
oster
25c8cdfd32 Print out the status value if a reconstruction read fails.
Don't print out write promotions during reconstruct unless
we are debugging reconstructs.
2008-04-14 17:24:50 +00:00
oster
287ee4e9a9 In a land before time, when kernel processes roamed the system, we
needed to keep track of the kernel process that opened a device in
order to close it with the right credentials.  Flash forward to today
where curlwp is now quite sufficient.
2008-01-26 20:44:37 +00:00
pooka
61e8303e9d Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start.  In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
2007-11-26 19:01:26 +00:00
oster
6384685d7c Fix wording in a comment and correct a debug line. From Olivier Cherrier
(via private mail).  Thanks!
2007-09-21 17:14:47 +00:00
ad
1c0f1b255b Fix fallout from recent kthread changes. 2007-07-18 19:04:58 +00:00
ad
88ab7da936 Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
2007-07-09 20:51:58 +00:00
cube
954bc13440 Change dk_lookup() to accept an additional argument of the type enum uio_seg
that tells whether the given path is in user space or kernel space, so it
can tell NDINIT().

While the raidframe calls were ok, both ccd(4) and cgd(4) were passing
pointers to user space data, which leads to strange error on i386, as
reported by Jukka Salmi on current-users.

The issue has been there since last august, I'm actually a bit surprised
that no one in the meantime has used ccd(4) or cgd(4) on an arch where it
would have simply faulted.
2007-06-26 15:22:23 +00:00
christos
168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
christos
4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
christos
ecdff16f80 - use dk_lookup instead of our home-spun version.
- allow raid to be configured in a wedge
- allow wedges to be configured in a raid
- add autoconfiguration of wedges in a raid
2006-08-27 05:07:12 +00:00
ad
3029ac48c7 - Use the LWP cached credentials where sane.
- Minor cosmetic changes.
2006-07-21 16:48:45 +00:00
elad
2867b68bc3 integrate kauth. 2006-05-14 21:42:26 +00:00
christos
95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
oster
97682553c4 If rf_SubmitReconBuffer indicates the submission was blocked (for
whatever reason), return 0 instead of the default
RF_RECON_READ_STOPPED.  Returning RF_RECON_READ_STOPPED would result
in rf_ContinueReconstructFailedDisk() thinking that the given
component was "done" and breaking out of the main reconstruction loop
far too early.  Reconstruction still worked correctly as long as there
were no errors, but RAIDframe wouldn't be in a position to properly
handle read/write errors during reconstruction.

This fixes the "raidctl's progress bar spins at 0% until
reconstruction finishes" problem.
2005-07-18 15:32:01 +00:00
oster
77708271bf - initialize numRUsTotal before we indicate that we are doing a reconstruct.
- make numRUsComplete and numRUsTotal 64-bit quantities like
everything else that records this information.
2005-06-08 02:00:53 +00:00
perry
f31bd063e9 nuke trailing whitespace 2005-02-27 00:26:58 +00:00
oster
be864067da The 'next' argument to rf_CreateDiskQueueData is always NULL. Since
there is no particular reason to pass an extra NULL argument, turf it,
and initialize p->next to NULL within the function.
2005-02-12 03:44:41 +00:00
oster
0b15470982 Add a 'waitflag' argument to rf_CreateDiskQueueData() and use it to
determine if we are willing to wait for memory to come from the
diskqueuedata (dqd) and bufpool pools.  Cleanup the mess related to
code calling rf_CreateDiskQueueData() with different expectations
(and/or blatent disregard) of what might happen if there were
insufficient pool resources.
2005-02-12 03:27:33 +00:00
oster
04a30b5e78 It's not a bad idea to update the component labels whether or not the
reconstruction was successful.
2005-02-06 02:29:36 +00:00
oster
339f61b703 rf_GetNextReconEvent() *will* return a valid event, so no need for
the assert.  (we'd have panic'ed in there long before this assert
if that wasn't the case).

Minor whitespace changes.
2005-02-05 23:39:12 +00:00
oster
c38bce14f6 Vastly improve the error handling in the case of a read/write error
that occurs during a reconstruction.  We go from zero error handling
and likely panicing if something goes amiss, to gracefully bailing and
leaving the system in the best, usable state possible.

- introduce rf_DrainReconEventQueue() to allow easy cleaning of the
reconstruction event queue

- change how we cleanup the floating recon buffers in
rf_FreeReconControl().  Detect the end of the list rather
than traversing according to a count.

- keep track of the number of pending reconstruction writes.  In the
event of a read error, use this to wait long enough for the pending
writes to (hopefully) drain.

- more cleanup is still needed on this code, but I didn't want to
start mixing major functional changes with minor cleanups.

XXX: There is a known issue with pool items left outstanding due to
the IO failure, and this can show up in the form of a panic at the
tail end of a shutdown.  This problem is much less severe than before
these changes, and the hope/plan is that this problem will go away
once this code gets overhauled again.
2005-02-05 23:32:43 +00:00
oster
c18a242754 Torch some #define's missed in last commit. 2005-01-22 02:24:31 +00:00
oster
3140947870 Reconstruction Descriptors are only allocated once per reconstruction,
and don't need their own pool or freelist or anything fancier than a
malloc/free.
2005-01-22 02:22:44 +00:00
oster
26187fa579 ForceReconReadDoneProc() needs a return after doing the first
rf_CauseReconEvent().
2005-01-18 03:29:51 +00:00
oster
10928931ab The switch() in rf_ContinueReconstructFailedDisk() is never actually
used in non-simulation code, and thus is just wasting space (and
making the code more confusing to read!).  Turf the switch, left-shift
the indentation of code, and nuke 'state' field of struct RF_RaidReconDesc_s.

No real functional changes.
2004-12-12 20:53:15 +00:00
oster
5cdd8e2bd5 continueFunc and continueArg arn't used. Turf. Simplify calls to
rf_GetNextReconEvent().
2004-11-15 17:16:28 +00:00
oster
1051cc745f Re-work the locking mechanisms for reconstruct and PSS structures
such that we don't actually hold a simplelock while we are doing
a pool_get(), but that we still effectively protecting critical code.

This should fix all of the outstanding LOCKDEBUG warnings related to
rebuilding RAID sets.
2004-03-18 16:54:54 +00:00
oster
8150ff6fbd - don't use rf_PrintUserStats() for recon statistics.
rf_PrintUserStats() was mean for the simulator, and doesn't provide
any real info in kernel-space, especially for reconstructs.
Reconstructing actually renders the stats even more useless, since it
resets them all to zero before the reconstruct starts!

 - since rf_PrintUserStats() is no longer used, nuke it along with the
routines that feed it.  Nothing was using this code, and if we ever
need it again, we know where to find it.
2004-03-13 02:00:15 +00:00
oster
f95359dd19 - Introduce rf_pools which contains all of the various global pools used
by RAIDframe.  Convert all other RAIDframe global pools to use pools
defined within this new structure.
- Introduce rf_pool_init(), used for initializing a single pool in
RAIDframe.  Teach each of the configuration routines to use
rf_pool_init().
- Cleanup a few pool-related comments.
- Cleanup revent initialization and #defines.
- Add a missing pool_destroy() for the reconbuffer pool.

(Saves another 1K off of an i386 GENERIC kernel, and makes
stuff a lot more readable)
2004-03-07 22:15:19 +00:00
oster
d02f580adf - fix up initialization of rf_recond_pool
- introduce rf_reconbuffer_pool and teach rf_MakeReconBuffer() to use it
2004-03-07 02:46:58 +00:00
oster
bfeeabba13 Use RF_INCLUDE_PARITY_DECLUSTERING_DS to #if-out more unneeded bits.
(We can't do RF_DISTRIBUTE_SPARE bits without the parity declustering stuff.)
2004-03-05 03:58:21 +00:00
oster
2fb9f8db54 Nuke some unnecessary casts. No functional changes. 2004-03-03 17:14:46 +00:00
oster
28bd6c8ea2 Introduce RF_REVENT_READ_FAILED, RF_REVENT_WRITE_FAILED and RF_REVENT_FORCEREAD_FAILED.
This removes 3 more RF_PANIC()'s (but we'll currently still panic if any of these cases occur).
fix up a few printf's.
XXX: still needs more cleanup and testing (and be taught to not panic).
2004-03-03 16:59:54 +00:00