between write i/os in a disk-based filesystem vs. the disk block being
freed by a truncation, allocated to a new file, and written again with
different data. if the disk driver reorders the requests and does
the second i/o first, the old data will clobber the new, corrupting
the new file.
tsleep() instead of DELAY. Also, keep trying flushing buffers when the
number of dirty buffers decreases (20 rounds may not be enouth for a
very large buffer cache).
Using tsleep instead of delay gives a chance to others kernel threads to run,
which is needed for raidframe. With this change I've not been able to
reproduce the 'dirty buffer not flushed' problem with raidframe.
use that to inform about way to raise current limit when we reach maximum
number of processes, descriptors or vnodes
XXX hopefully I catched all users of tablefull()
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.
Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.
For each leaf filesystem, add appropriate vfs_done routine.
mode and ownership bits are flushed to disk before the vnode is
reclaimed.
The check, introduced in the softdep merge, assumes that if no blocks
are dirty, no file data *or metadata* needs to be flushed to disk. This
is true of ffs, but is not true of lfs, and may not be true of other
filesystems.
Tested by myself and Bill Squier <groo@cs.stevens-tech.edu>.
(Previously buffers could be marked dirty by the cleaner, and possibly by
other means.)
Also check for softdep mount in vfs_shutdown before trying to bawrite
buffers, since other filesystems don't need it and lfs doesn't bawrite.
(This fragment reviewed by fvdl.)
Partially addresses PR#8964.
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.
Bump version number to 1.4O
The problem was due to an interaction between the doomed unmounts done by
amd and getnewvnode.
I convinced myself that it's ok for getnewvnode() to do a sleeping vfs_busy().
Tested with multiple builds running while another process attempted to unmount
/usr once a second.
getnewvnode now checks this bit, and it if's set makes sure a vnode's not
locked before removing it from the free list.
Closes PR 7954 by Alan Barrett <apb@iafrica.com>.
mp->mnt_flags & MNT_MWAIT is replaced by mp->mnt_wcnt, and a new mount
flag MNT_GONE is created (reusing the same bit).
In insmntque(), add DIAGNOSTIC check to fail if the filesystem vnode
is being moved to is in the process of being unmounted.
getnewvnode() now protects the list of vnodes active on mp with
vfs_busy()/vfs_unbusy().
To avoid generating spurious errors during a doomed unmount, change
the "wait for unmount to finish" protocol between dounmount() and
vfs_busy(). In vfs_busy(), instead of only sleeping once, sleep until
either MNT_UNMOUNT is clear or MNT_GONE is set; also, maintain a count
of waiters in mp->mnt_wcnt so that dounmount() knows when it's safe to
free mp.
tested by running a "while :; do mount /d1; umount -f /d1; done" loop
against multiple find(1) processes.
deadlock in VOP_FSYNC() if the unreferenced vnode picked for
reclamation happened to be stacked on top of a vnode the process
already had locked. This could happen if the same filesystem was
accessed both through a union mount and directly; it seemed to happen
most frequently when the direct access was through NFS.
Avoid this deadlock by changing vinvalbuf to pass a new FSYNC_RECLAIM
flag bit to VOP_FSYNC() to indicate that a reclaim is in progress and
only a `shallow' fsync is necessary.
Do nothing in *_fsync() in umapfs, nullfs, and unionfs when
FSYNC_RECLAIM is set; the underlying vnodes will shortly be released
in *_reclaim and may be reclaimed (and fsync'ed) later.