This uglifies the interface, because several operations need to be
passed the namei flags and cache_lookup also needs for the time being
to be passed cnp->cn_nameiop. Nonetheless, it's a net benefit.
The glop should be able to go away eventually but requires structural
cleanup elsewhere first.
This change requires a kernel bump.
- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.
- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.
- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.
This change requires a kernel bump.
parent, keeping them active, and allowing to lookup .. without sending
a request to the filesystem.
Enable the featuure for perfused, as this is how FUSE works.
be vfs_detach'ed by module autounload before puffs_vfsop_unmount() completes
and has freed ressource from the pools. By holding a reference on
puffs_vfsops from each mount, we ensure that no race can occur here.
Works around the crash in kern/46734
We introduced a slow queue for delayed reclaims, while the existing
queue for unmount, flush and exist has been renamed fast queue. Both
queues had timestamp for when an operation should be done, but it was
useless for the fast queue, which is always used to run an operation
ASAP. And the timestamp test had an error that turned ASAP into "at next
tick", but nobody what there to wake the thread at next tick, hence
the hang. The fix is to remove the useless and buggy timestamp test for
fast queue.
The normal kernel behavior is to retain inactive nodes in the freelist
until it runs out of vnodes. This has some merit for local filesystems,
where the cost of an allocation is about the same as the cost of a
lookup. But that situation is not true for distributed filesystems.
On the other hand, keeping inactive nodes for a long time hold memory
in the file server process, and when the kernel runs out of vnodes, it
produce reclaim avalanches that increase lattency for other operations.
We do not reclaim inactive vnodes immediatly either, as they may be
looked up again shortly. Instead we introduce a grace time and we
reclaim nodes that have been inactive beyond the grace time.
- Fix lookup/reclaim race condition.
The above improvement undercovered a race condition between lookup and
reclaim. If we reclaimed a vnode associated with a userland cookie while
a lookup returning that same cookiewas inprogress, then the kernel ends
up with a vnode associated with a cookie that has been reclaimed in
userland. Next operation on the cookie will crash (or at least confuse)
the filesystem.
We fix this by introducing a lookup count in kernel and userland. On
reclaim, the kernel sends the count, which enable userland to detect
situation where it initiated a lookup that is not completed in kernel.
In such a situation, the reclaim must be ignored, as the node is about
to be looked up again.
instance when doing a fault-issued VOP_GETPAGES within VOP_WRITE, changing
size leads to panic: genfs_getpages: past eof.
-Handle ticks wrap around for vnode name andattribute timeout
lookup, create, mknod, mkdir, symlink, getattr and setattr messages
have been extended so that attributes and their TTL can be provided
by the filesytem. lookup, create, mknod, mkdir, and symlink messages
are also extended so that the filesystem can provide name TTL.
The change does not make consensus, since only pagedaemon should need it.
Other threads will tolerate sleeping, and problems here are only symptoms
that something is going wrong in memory management. The cause, not the
symptoms, need to be fixed.
RUMP-visible code. Instead of checking that updateproc (aka ioflush,
aka syncer) will not sleep in PUFFS code, I check for any kernel thread:
after all none of them are designed to hang awaiting for a remote filesystem
operation to complete.
a memory allocation, or a response from the filesystem.
This avoids deadlocks in the following situations:
1) when memory is low: ioflush waits the fileystem, the fielsystem waits
for memory
2) when the filesystem does not respond (e.g.: network outage ona
distributed filesystem)
This is required to avoid data corruption bugs, where a getattr slices
itself within a setattr operation, and sets the size to the stall value
it got from the filesystem. That value is smaller than the one set by
setattr, and the call to uvm_vnp_setsize() trigged a spurious truncate.
The result is a chunk of zeroed data in the file.
Such a situation can easily happen when the ioflush thread issue a
VOP_FSYNC/puffs_vnop_sync/flushvncache/dosetattrn while andother process
do a sys_stat/VOP_GETATTR/puffs_vnop_getattr.
This mutex on size operation can be removed the day we decide VOP_GETATTR
has to operated on a locked vnode, since the other operations that touch
size already require that.
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)
This removes the need for the SAVENAME and HASBUF namei flags.
Interrupt server wait only on certain signals (same set at nfs -i)
instead of all signals. According to the PR this helps with
"git clone" run on a puffs file system.