Another process could be vget()ing the vnode and bump v_usecount while
getcleanvnode() is vclean()ing it (as vclean drops the interlock).
vget() will then wait for VI_XLOCK or VI_FREEING to clear; and we could test
this assertion while the other process is still slepping. We could even
end up in ungetnewvnode() before this other process got a chance to run.
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.
no functional change
can sleep on the vnode lock, while vget is sleeping on the
VI_INACTNOW flag (or the vget caller is looping on vget returning failure
because of the VI_INACTNOW flag). With layered FSes, the upper and lower
vnodes share the same lock, so the vget() caller above can be already
holding the vnode lock.
Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
vrelel(), and check the ref count again once we have the lock. If the
vnode has more than one reference, donc VOP_INACTIVE it.
Fix PR kern/42318 and PR kern/42377
patch tested by Hisashi T Fujinaka, Joachim König, Stephen Borrill and
Matthias Scheler.
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
if the vnode is being inactivated (by vrelel()), wait for
vrelel() to complete (or return EBUSY if we can't wait), and return
ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
filesystem is mounted. Synchronize access to the number with a
mutex. When a struct mount, mp, is allocated, assign the current
generation number to mp->mnt_gen. Introduce vfs_unmount_forceone()
that forcefully unmounts the most recently mounted filesystem.
Refactor: extract vfs_shutdown1() from vfs_shutdown(). Extract
vfs_sync_all() from vfs_shutdown1().
Print more progress indications while we're unmounting all of the
filesystems during shutdown.
We increase the reference count on mp before calling dounmount(mp),
but we do not decrease it if dounmount(mp) fails, and neither does
dounmount(mp). So decrease the reference count if dounmount(mp)
fails.
Change the loop terminating condition in vfs_unmountall1() to (mp
!= (void *)&mountlist) from !CIRCLEQ_EMPTY(&mountlist), because we
may not ever empty the list, especially if we're not forcing the
filesystems to unmount.
the other routines of the same spirit.
Adjust file-system code to use it.
Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).
No objections on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
this fixes the following deadlock.
a thread doing getcleanvnode:
pick a vnode
acqure v_interlock
v_usecount++
call vclean
now, another thread doing cache_lookup:
picks the vnode
vtryget succeed
vn_lock succeed
now in vclean:
set VI_XLOCK (too late to be noticed by the competing thread)
wait on the vnode lock (this might violate locking order)
the use of a flag bit was suggested by Andrew Doran. PR/41374.
way they can be included without having to include DDB.
(arguably all print routines should be behind #ifdef DEBUGPRINT
and options DDB should define that macro, but I'll tackle that later)
a new struct mount-allocation routine, vfs_mountalloc(9). Documentation
updates will follow.
Attention: Synchronization Oversight Committee! In mount_domount(),
I postpone the call mutex_enter(&mp->mnt_updating) until right before
the VFS_MOUNT(9) call because (1) that looks to me like the earliest
possible opportunity for mp to become visible to any other LWP, because
it was just kmem_zalloc(9)'d and (2) it made extracting the common code
much easier. Tell me if my reasoning is faulty.
filesystem at all, false otherwise. This will support tearing down
stacks of filesystems, ccd(4), raid(4), and vnd(4).
Change the misleading variable name 'allerror' to 'any_error'. Make it
a bool.
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep
Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
- vfs_syscalls.c rev. 1.342 fails to invert condition correcly when
then-clause and else-clause is swapped. Since then, revoke(2) fails
if it is issued by file owner.
- Probably since rev. 1.160 of genfs_vnops.c, revoke(2) fails if it is
applied to non-device file and drops kernel into ddb.
embedding the address of its xxx_mountroot() in swapnetbsd.c. This
permits booting of kernels with hard-wired filesystem type even if the
filesystem is in a loadable module (ie, not linked into the kernel
image).
Discussed on current-users. Tested on amd64 and i386 with both hard-
wired and '?' filesystem times, and on both modular and monolithic
kernels.
Thanks to pooka@ for code review and suggestions.
Addresses my PR kern/40167
Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.
OK'd by core@, releng@.