puffs_daemon(3) creates a pipe before forking, and the parent process
waits for the child to either complete puffs_mount(3) or fail. If a
user calls puffs_daemon(3) after puffs_mount(3), the function
deadlocks. While this error-reporting functionality is really a nice
thing to have, deadlocking is not great. If the filesystem has already
been mounted, puffs_mount(3) should just daemonize the process and
return.
This became an issue because fuse_daemonize(3) in FUSE API had no such
requirement and some FUSE filesystems in the wild suffered deadlocks.
Chuck Silvers pointed out that voff_t was also supposed to be
kernel-only. The correct type to use in userland would be off_t, but
since changing vsize_t to either voff_t or off_t is an ABI change on
32-bit platforms, we use size_t knowing that it is technically
incorrect.
<puffs.h> is a user-space header, and should not use any of
kernel-only types. It's not reasonable to require user-land
filesystems to #define _KERNTYPES.
FUSE filesystems do not expect to get metadata updates for [amc]time
and size, they updates the value on their own after operations.
The PUFFS PUFFS_KFLAG_NOFLUSH_META option prevents regular metadata cache
flushes to the filesystem , and libperfuse uses it to match Linux FUSE
behavior.
While there, fix a bug in SETATTR: do not update kernel metadata cache
from SETATTR reply when the request is asynchronous, as we do not have
the reply yet.
libpuffs calls realpath() to obtain an absolute path to use for mounting.
If the obtained path is different from the one given by the caller, a
warning is issued. This included the situation where the path passed by
the caller just have trailing slashes, a situation where we just want them
to be striped without a warning.
by the filesystem to send back information about the file.
This is used to implement PUFFS_OPEN_IO_DIRECT by which the filesystem
tells the kernel that read/write should bypass the page cache.
condition was addressed in libpuffs by counting lookups.
The fix assumes that cookies map to struct puffs_cookie, which has not
been documented as a requirement for filesystems using libpuffs. As an
example, we got burnt by this assumption in libp2k (kern/46734), and
we fixed bit by actually mapping libp2k cookies to struct puffs_node.
It is unlikely, but there may be third party filesystems that use cookies
unmapped to struct puffs_node, and they were left broken for now.
- we introduce a puffs_init() flag PUFFS_FLAG_PNCOOKIE that let filesystems
inform libpuffs that they map cookies to struct puffs_node. Is that flag
is used, the lookup/reclaim race condition fix is enabled. We enable the
flag for libp2k.
- filesystems that use puffs_pn_new() obviouslty use struct puffs_node
and gain PUFFS_FLAG_PNCOOKIE automatically even if they did not specify
it in puffs_init(). This include all our PUFFS filesystem in-tree except
libp2k.
- for filesystems not willing to use struct puffs_node, we introduce a
reclaim2 vnop, which is reclaim with an additionnal lookup count argument.
This vnop let the filesystem implement the lookup/reclaim race fix on
its own.
parent, keeping them active, and allowing to lookup .. without sending
a request to the filesystem.
Enable the featuure for perfused, as this is how FUSE works.
The normal kernel behavior is to retain inactive nodes in the freelist
until it runs out of vnodes. This has some merit for local filesystems,
where the cost of an allocation is about the same as the cost of a
lookup. But that situation is not true for distributed filesystems.
On the other hand, keeping inactive nodes for a long time hold memory
in the file server process, and when the kernel runs out of vnodes, it
produce reclaim avalanches that increase lattency for other operations.
We do not reclaim inactive vnodes immediatly either, as they may be
looked up again shortly. Instead we introduce a grace time and we
reclaim nodes that have been inactive beyond the grace time.
- Fix lookup/reclaim race condition.
The above improvement undercovered a race condition between lookup and
reclaim. If we reclaimed a vnode associated with a userland cookie while
a lookup returning that same cookiewas inprogress, then the kernel ends
up with a vnode associated with a cookie that has been reclaimed in
userland. Next operation on the cookie will crash (or at least confuse)
the filesystem.
We fix this by introducing a lookup count in kernel and userland. On
reclaim, the kernel sends the count, which enable userland to detect
situation where it initiated a lookup that is not completed in kernel.
In such a situation, the reclaim must be ignored, as the node is about
to be looked up again.