- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
v_interlock. They are actually the same lock, but the former protects
the uvm object associated with the vnode, and the latter vnode
reference counts. Explained to me by chs@.
obtaining interlock on container vnode in coda_{get,put}pages. This
is the only functional change in this commit.
Improve many comments. In particular, note that the relationship
between VOP_OPEN and obtaining a container file (e.g. for getpages for
executables) is messy.
Add printfs for 'internal open' cases in coda_rdwr. These have not
been triggered in my testing. Note an apparent vref leak.
does not trigger assertions in uvm_fault, and executing files from
coda works as well.
Code very lightly reviewed by wrstuden@; scrutiny by those who
understand vnode and especially {get,put}pages would be appreciated.
Re-enable mmap. The problem is how uvm_fault handles page faults from
coda vnodes via container files, and executing a program caused the
same problem so disabling mmap only helped cp(1).
coda_open:
rename variables to match vnode_if.src
better comments about lock/reference state of vnodes
keep lock on container file until after VOP_OPEN, which requires locked vp
remove #if 0'd code to PNBUF_PUT
coda_link:
rename variables to match vnode_if.src
error out early if vp == dvp
check return value on vn_lock, and add comment questoining the lock
clarify lock handling, but unchanged logic
remove #if 0'd code to PNBUF_PUT
coda_rmdir:
error out early if vp == dvp
remove #if 0'd code to PNBUF_PUT
coda_grab_vnode:
add comments, and in particular question undocumented VFS_VGET semantics
coda_getpages:
question calling VOP_OPEN, which requires a locked vnode, with the
vnode we got (vop_getpages does not guarantee a locked vnode)
coda_putpages:
remove inexplicable simple_unlock(&vp->v_interlock);
add printf so we notice if this is ever called
add comment explaining that the implementation will lead to trouble,
because vnode_if.src says putpages is called with v_uobj.vmobjlock
held and is supposed to unlock it
With these changes and an uncommitted change to uvm_fault not to panic
if uvm objects are not equal, coda seems stable again.
got a panic in uvm_fault from ffs_write. I believe this is because cp
used mmap, the container file page was not in core, and uvm_fault
objected to the container file vnode and the coda vnode not matching.
I have long been plagued by crashes on cp from coda, and this was the
first time I got and understood a backtrace.
Clean up old comments that are no longer accurate.
Document refcounting better.
Note some questionable behaviors with XXX.
Clean up PNBUF_PUT and SAVESTART. Only do this where vnodeops(9) says
we should, and do it on error also.
In symlink, vput parent and free namebuf even in error cases.
the unlock parent, lock child, lock parent in the ISDOTDOT case.
Clean up and rewrite comments to match more closely current reality.
Sprinkle XXX where I'm not sure the current rules are being followed.
Reviewed by wrstuden@, who agreed that this is an improvement over the
current code, with concerns about LK_RETRY and whether the ISDOTDOT
locking is done soon enough.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.
An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.
In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
dereferencing it. (I added this during search for the problem fixed by
the earlier readlink buffer allocation fix, and the checks have not
triggered. Still, it's wrong of the kernel to use pointers from user
space without validation.)
used.
Remove defect in size allocation for coda_readlink to avoid having
venus write outside malloced space by including pathname space before
allocation.
Add asserts that cred structure is non-NULL and non-FSCRED.
Check lwp against NULL before dereferencing it.
Assert that output pointer is non-NULL on a few venus returns. This "can't
happen" but has been seen in crash dumps.
With these changes, the following work on a 345 MB coda volume.
(Before, a single invocation of tar or pax on this volume would
crash.)
$ for i in $(seq 1 10); do find . -type f -print0 |xargs -0 md5 > MD5.$i & done
Two copies of
$ for i in $(seq 1 10); do pax -w /coda/[redacted] >/dev/null & done
(lwp NULL check semi-reviewed by wrstuden@)
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
code paniced if the first attempt to lock the vnode failed, and such
failures are not errors - just cause to wait. gdt was regularly
hitting this panic.
Correct one of two identical panic messages.
Add XXX comments about
ISDOTDOT locking rules not being followed
questioning the practice of unlocking parent before locking child.
(But, given that the vnode is referenced, it can't be deleted, so
maybe this is fine.)
Why is failured to unlock not a panic but failure to lock is?