Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.
As a consequence, most of the file systems can now be loaded as new style
modules.
Quick sanity check by ad@.
Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.
Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:
- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.
- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.
One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:
- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
we no longer need to guard against access from hardware interrupt handlers.
Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
- Add a lot of missing selinit() and seldestroy() calls.
- Merge selwakeup() and selnotify() calls into a single selnotify().
- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.
Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
v_interlock. They are actually the same lock, but the former protects
the uvm object associated with the vnode, and the latter vnode
reference counts. Explained to me by chs@.
obtaining interlock on container vnode in coda_{get,put}pages. This
is the only functional change in this commit.
Improve many comments. In particular, note that the relationship
between VOP_OPEN and obtaining a container file (e.g. for getpages for
executables) is messy.
Add printfs for 'internal open' cases in coda_rdwr. These have not
been triggered in my testing. Note an apparent vref leak.
does not trigger assertions in uvm_fault, and executing files from
coda works as well.
Code very lightly reviewed by wrstuden@; scrutiny by those who
understand vnode and especially {get,put}pages would be appreciated.
Re-enable mmap. The problem is how uvm_fault handles page faults from
coda vnodes via container files, and executing a program caused the
same problem so disabling mmap only helped cp(1).
coda_open:
rename variables to match vnode_if.src
better comments about lock/reference state of vnodes
keep lock on container file until after VOP_OPEN, which requires locked vp
remove #if 0'd code to PNBUF_PUT
coda_link:
rename variables to match vnode_if.src
error out early if vp == dvp
check return value on vn_lock, and add comment questoining the lock
clarify lock handling, but unchanged logic
remove #if 0'd code to PNBUF_PUT
coda_rmdir:
error out early if vp == dvp
remove #if 0'd code to PNBUF_PUT
coda_grab_vnode:
add comments, and in particular question undocumented VFS_VGET semantics
coda_getpages:
question calling VOP_OPEN, which requires a locked vnode, with the
vnode we got (vop_getpages does not guarantee a locked vnode)
coda_putpages:
remove inexplicable simple_unlock(&vp->v_interlock);
add printf so we notice if this is ever called
add comment explaining that the implementation will lead to trouble,
because vnode_if.src says putpages is called with v_uobj.vmobjlock
held and is supposed to unlock it
With these changes and an uncommitted change to uvm_fault not to panic
if uvm objects are not equal, coda seems stable again.
got a panic in uvm_fault from ffs_write. I believe this is because cp
used mmap, the container file page was not in core, and uvm_fault
objected to the container file vnode and the coda vnode not matching.
I have long been plagued by crashes on cp from coda, and this was the
first time I got and understood a backtrace.
Clean up old comments that are no longer accurate.
Document refcounting better.
Note some questionable behaviors with XXX.
Clean up PNBUF_PUT and SAVESTART. Only do this where vnodeops(9) says
we should, and do it on error also.
In symlink, vput parent and free namebuf even in error cases.
the unlock parent, lock child, lock parent in the ISDOTDOT case.
Clean up and rewrite comments to match more closely current reality.
Sprinkle XXX where I'm not sure the current rules are being followed.
Reviewed by wrstuden@, who agreed that this is an improvement over the
current code, with concerns about LK_RETRY and whether the ISDOTDOT
locking is done soon enough.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.
An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.
In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.