Re-enable mmap. The problem is how uvm_fault handles page faults from
coda vnodes via container files, and executing a program caused the
same problem so disabling mmap only helped cp(1).
coda_open:
rename variables to match vnode_if.src
better comments about lock/reference state of vnodes
keep lock on container file until after VOP_OPEN, which requires locked vp
remove #if 0'd code to PNBUF_PUT
coda_link:
rename variables to match vnode_if.src
error out early if vp == dvp
check return value on vn_lock, and add comment questoining the lock
clarify lock handling, but unchanged logic
remove #if 0'd code to PNBUF_PUT
coda_rmdir:
error out early if vp == dvp
remove #if 0'd code to PNBUF_PUT
coda_grab_vnode:
add comments, and in particular question undocumented VFS_VGET semantics
coda_getpages:
question calling VOP_OPEN, which requires a locked vnode, with the
vnode we got (vop_getpages does not guarantee a locked vnode)
coda_putpages:
remove inexplicable simple_unlock(&vp->v_interlock);
add printf so we notice if this is ever called
add comment explaining that the implementation will lead to trouble,
because vnode_if.src says putpages is called with v_uobj.vmobjlock
held and is supposed to unlock it
With these changes and an uncommitted change to uvm_fault not to panic
if uvm objects are not equal, coda seems stable again.
got a panic in uvm_fault from ffs_write. I believe this is because cp
used mmap, the container file page was not in core, and uvm_fault
objected to the container file vnode and the coda vnode not matching.
I have long been plagued by crashes on cp from coda, and this was the
first time I got and understood a backtrace.
Clean up old comments that are no longer accurate.
Document refcounting better.
Note some questionable behaviors with XXX.
Clean up PNBUF_PUT and SAVESTART. Only do this where vnodeops(9) says
we should, and do it on error also.
In symlink, vput parent and free namebuf even in error cases.
the unlock parent, lock child, lock parent in the ISDOTDOT case.
Clean up and rewrite comments to match more closely current reality.
Sprinkle XXX where I'm not sure the current rules are being followed.
Reviewed by wrstuden@, who agreed that this is an improvement over the
current code, with concerns about LK_RETRY and whether the ISDOTDOT
locking is done soon enough.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.
An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.
In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
dereferencing it. (I added this during search for the problem fixed by
the earlier readlink buffer allocation fix, and the checks have not
triggered. Still, it's wrong of the kernel to use pointers from user
space without validation.)
used.
Remove defect in size allocation for coda_readlink to avoid having
venus write outside malloced space by including pathname space before
allocation.
Add asserts that cred structure is non-NULL and non-FSCRED.
Check lwp against NULL before dereferencing it.
Assert that output pointer is non-NULL on a few venus returns. This "can't
happen" but has been seen in crash dumps.
With these changes, the following work on a 345 MB coda volume.
(Before, a single invocation of tar or pax on this volume would
crash.)
$ for i in $(seq 1 10); do find . -type f -print0 |xargs -0 md5 > MD5.$i & done
Two copies of
$ for i in $(seq 1 10); do pax -w /coda/[redacted] >/dev/null & done
(lwp NULL check semi-reviewed by wrstuden@)
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
code paniced if the first attempt to lock the vnode failed, and such
failures are not errors - just cause to wait. gdt was regularly
hitting this panic.
Correct one of two identical panic messages.
Add XXX comments about
ISDOTDOT locking rules not being followed
questioning the practice of unlocking parent before locking child.
(But, given that the vnode is referenced, it can't be deleted, so
maybe this is fine.)
Why is failured to unlock not a panic but failure to lock is?
- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.
Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.
Welcome to 2.0F.
Approved by: Jason R. Thorpe <thorpej@netbsd.org>
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
no longer use and/or need it
- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change
Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.