condition was addressed in libpuffs by counting lookups.
The fix assumes that cookies map to struct puffs_cookie, which has not
been documented as a requirement for filesystems using libpuffs. As an
example, we got burnt by this assumption in libp2k (kern/46734), and
we fixed bit by actually mapping libp2k cookies to struct puffs_node.
It is unlikely, but there may be third party filesystems that use cookies
unmapped to struct puffs_node, and they were left broken for now.
- we introduce a puffs_init() flag PUFFS_FLAG_PNCOOKIE that let filesystems
inform libpuffs that they map cookies to struct puffs_node. Is that flag
is used, the lookup/reclaim race condition fix is enabled. We enable the
flag for libp2k.
- filesystems that use puffs_pn_new() obviouslty use struct puffs_node
and gain PUFFS_FLAG_PNCOOKIE automatically even if they did not specify
it in puffs_init(). This include all our PUFFS filesystem in-tree except
libp2k.
- for filesystems not willing to use struct puffs_node, we introduce a
reclaim2 vnop, which is reclaim with an additionnal lookup count argument.
This vnop let the filesystem implement the lookup/reclaim race fix on
its own.
parent, keeping them active, and allowing to lookup .. without sending
a request to the filesystem.
Enable the featuure for perfused, as this is how FUSE works.
The normal kernel behavior is to retain inactive nodes in the freelist
until it runs out of vnodes. This has some merit for local filesystems,
where the cost of an allocation is about the same as the cost of a
lookup. But that situation is not true for distributed filesystems.
On the other hand, keeping inactive nodes for a long time hold memory
in the file server process, and when the kernel runs out of vnodes, it
produce reclaim avalanches that increase lattency for other operations.
We do not reclaim inactive vnodes immediatly either, as they may be
looked up again shortly. Instead we introduce a grace time and we
reclaim nodes that have been inactive beyond the grace time.
- Fix lookup/reclaim race condition.
The above improvement undercovered a race condition between lookup and
reclaim. If we reclaimed a vnode associated with a userland cookie while
a lookup returning that same cookiewas inprogress, then the kernel ends
up with a vnode associated with a cookie that has been reclaimed in
userland. Next operation on the cookie will crash (or at least confuse)
the filesystem.
We fix this by introducing a lookup count in kernel and userland. On
reclaim, the kernel sends the count, which enable userland to detect
situation where it initiated a lookup that is not completed in kernel.
In such a situation, the reclaim must be ignored, as the node is about
to be looked up again.
- setattr_ttl is updated to add a flag argument. Since it was not present in
a previous release, we can change its API
- write2 is introduced, this is write with an extra flag for FAF.
- fsync already has the FAF information in a flag and needs no change
- for other operations, FAF is unconditional
attribute and TTL fora newly created node. Instead extend puffs_newinfo
and add puffs_newinfo_setva() and puffs_newinfo_setttl()
- Remove node_mk_common_final in libperfuse. It used to set uid/gid for
a newly created vnode but has been made redundant along time ago since
uid and gid are properly set in FUSE header.
- In libperfuse, check for corner case where opc = 0 on INACTIVE and RECLAIM (how is it possible? Check for it to avoid a crash anyway)
- In libperfuse, make sure we unlimit RLIMIT_AS and RLIMIT_DATA so that
we do notrun out of memory because the kernel is lazy at reclaiming vnodes.
- In libperfuse, cleanup style of perfuse_destroy_pn()
attribute cache with filesystem provided TTL.
lookup, create, mknod, mkdir, symlink, getattr and setattr messages
have been extended so that attributes and their TTL can be provided
by the filesytem. lookup, create, mknod, mkdir, and symlink messages
are also extended so that the filesystem can provide name TTL.
The filesystem updates attributes and TTL using
puffs_pn_getvap(3), puffs_pn_getvattl(3), and puffs_pn_getcnttl(3)
When remove files using name from pnode, another link on this file
can be unlinked. E.g. "touch 1; ln 1 2; rm 2" will remove file named
"1". Thus puffs_null_node_remove should remove directory entry which
name is provided by pcn (as said in puffs_ops.3). Caller should
provide appropriately initialized pcn.
From Evgeniy Ivanov <lolkaantimat@gmail.com>
When puffs_null_node_rename() overwrites existing file, its pnode
must be removed, because src pnode already represents this file.
From Evgeniy Ivanov <lolkaantimat@gmail.com>
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
This makes the size the same on 64bit archs. Don't bother bumping
any version, since you'd have explicitly had to jump through some
hoops to use pathconf before.