by the filesystem to send back information about the file.
This is used to implement PUFFS_OPEN_IO_DIRECT by which the filesystem
tells the kernel that read/write should bypass the page cache.
condition was addressed in libpuffs by counting lookups.
The fix assumes that cookies map to struct puffs_cookie, which has not
been documented as a requirement for filesystems using libpuffs. As an
example, we got burnt by this assumption in libp2k (kern/46734), and
we fixed bit by actually mapping libp2k cookies to struct puffs_node.
It is unlikely, but there may be third party filesystems that use cookies
unmapped to struct puffs_node, and they were left broken for now.
- we introduce a puffs_init() flag PUFFS_FLAG_PNCOOKIE that let filesystems
inform libpuffs that they map cookies to struct puffs_node. Is that flag
is used, the lookup/reclaim race condition fix is enabled. We enable the
flag for libp2k.
- filesystems that use puffs_pn_new() obviouslty use struct puffs_node
and gain PUFFS_FLAG_PNCOOKIE automatically even if they did not specify
it in puffs_init(). This include all our PUFFS filesystem in-tree except
libp2k.
- for filesystems not willing to use struct puffs_node, we introduce a
reclaim2 vnop, which is reclaim with an additionnal lookup count argument.
This vnop let the filesystem implement the lookup/reclaim race fix on
its own.
The normal kernel behavior is to retain inactive nodes in the freelist
until it runs out of vnodes. This has some merit for local filesystems,
where the cost of an allocation is about the same as the cost of a
lookup. But that situation is not true for distributed filesystems.
On the other hand, keeping inactive nodes for a long time hold memory
in the file server process, and when the kernel runs out of vnodes, it
produce reclaim avalanches that increase lattency for other operations.
We do not reclaim inactive vnodes immediatly either, as they may be
looked up again shortly. Instead we introduce a grace time and we
reclaim nodes that have been inactive beyond the grace time.
- Fix lookup/reclaim race condition.
The above improvement undercovered a race condition between lookup and
reclaim. If we reclaimed a vnode associated with a userland cookie while
a lookup returning that same cookiewas inprogress, then the kernel ends
up with a vnode associated with a cookie that has been reclaimed in
userland. Next operation on the cookie will crash (or at least confuse)
the filesystem.
We fix this by introducing a lookup count in kernel and userland. On
reclaim, the kernel sends the count, which enable userland to detect
situation where it initiated a lookup that is not completed in kernel.
In such a situation, the reclaim must be ignored, as the node is about
to be looked up again.
- setattr_ttl is updated to add a flag argument. Since it was not present in
a previous release, we can change its API
- write2 is introduced, this is write with an extra flag for FAF.
- fsync already has the FAF information in a flag and needs no change
- for other operations, FAF is unconditional
attribute and TTL fora newly created node. Instead extend puffs_newinfo
and add puffs_newinfo_setva() and puffs_newinfo_setttl()
- Remove node_mk_common_final in libperfuse. It used to set uid/gid for
a newly created vnode but has been made redundant along time ago since
uid and gid are properly set in FUSE header.
- In libperfuse, check for corner case where opc = 0 on INACTIVE and RECLAIM (how is it possible? Check for it to avoid a crash anyway)
- In libperfuse, make sure we unlimit RLIMIT_AS and RLIMIT_DATA so that
we do notrun out of memory because the kernel is lazy at reclaiming vnodes.
- In libperfuse, cleanup style of perfuse_destroy_pn()
attribute cache with filesystem provided TTL.
lookup, create, mknod, mkdir, symlink, getattr and setattr messages
have been extended so that attributes and their TTL can be provided
by the filesytem. lookup, create, mknod, mkdir, and symlink messages
are also extended so that the filesystem can provide name TTL.
The filesystem updates attributes and TTL using
puffs_pn_getvap(3), puffs_pn_getvattl(3), and puffs_pn_getcnttl(3)
When remove files using name from pnode, another link on this file
can be unlinked. E.g. "touch 1; ln 1 2; rm 2" will remove file named
"1". Thus puffs_null_node_remove should remove directory entry which
name is provided by pcn (as said in puffs_ops.3). Caller should
provide appropriately initialized pcn.
From Evgeniy Ivanov <lolkaantimat@gmail.com>
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
Note: We use the storage of puffs_cache_write from puffs_ops for
this purpose. It's not issued by the kernel and hence currently
unused, and this saves us from the trouble of bumping the lib major
version.
Previously each request was executed on its own callcontext and
switched to every time the request was being processed. Now requests
borrow the mainloop context and switch only if/when they yield.
This takes two context switches away from a file system request
bringing down the typical amounts 2->0 (e.g. dtfs) and 4->2 (e.g.
psshfs).
The interfaces for manually executing requests changed a bit:
puffs_dispatch_create() and puffs_dispatch_exec() must now be used.
They are not tested, as nothing in-tree wants them and I doubt
anyone else is really interested in them either.
Also do some misc code cleanup related to execution contexts. The
"work-in-progress checkpoint" committed over a year ago was starting
to look slightly weed-infested.
them every time. Speeds up pure in-memory file systems such as
sysctlfs or dtfs quite a bit. For actual I/O-workhorses the result
is of course less tasty.
Get rid of the original puffs_req(3) framework and use puffs_framebuf(3)
instead for file system requests. It has the advantage of being
suitable for transporting a distributed message passing protocol
and therefore us being able to run the file system server on any
host.
Ok, puffs is not quite here yet: libpuffs needs to grow request
routing support and the message contents need to be munged into a
host independent format. Saying which format would be telling,
but it might begin with an X, end in an L and have the 13th character
in the middle. Keep an eye out for the sequels: Parts 3+m/n.
Ok, ok, a few more words about it: stop holding puffs_cc as a holy
value and passing it around to almost every possible place (popquiz:
which kernel variable does this remind you of?). Instead, pass
the natural choice, puffs_usermount, and fetch puffs_cc via
puffs_cc_getcc() only in routines which actually need it. This
not only simplifies code, but (thanks to the introduction of
puffs_cc_getcc()) enables constructs which weren't previously sanely
possible, say layering as a curious example.
There's still a little to do on this front, but this was the major
fs interface blast.
separately
* provide puffs_cc_getcc()
This is in preparation for the removal of you-should-guess-what as
an argument to routines here and there and everywhere.
alternative to the (vastly superior ;) continuation model. This
is very preliminary stuff and not compiled by default (which it
even won't do without some other patches I cannot commit yet).
The raison d'commit of the patch is a snippet which ensures proper
in-order dispatching of all operations, including those which don't
require a response. Previously many of them would be dispatched
simultaneosly, e.g. fsync and reclaim on the same node, which
obviously isn't all that nice for correct operation.
interacts with the userspace file server:
* since the kernel-user communication is not purely request-response
anymore (hasn't been since 2006), try to rename some "request" to
"message". more similar mangling will take place in the future.
* completely rework how messages are allocated. previously most of
them were borrowed from the stack (originally *all* of them),
but now always allocate dynamically. this makes the structure
of the code much cleaner. also makes it possible to fix a
locking order violation. it enables plenty of future enhancements.
* start generalizing the transport interface to be independent of puffs
* move transport interface to read/write instead of ioctl. the
old one had legacy design problems, and besides, ioctl's suck.
implement a very generic version for now; this will be
worked on later hopefully some day reaching "highly optimized".
* implement libpuffs support behind existing library request
interfaces. this will change eventually (I hate those interfaces)
kernel to the file server for silly things the file server did,
e.g. attempting to create a file with size VSIZENOTSET. The file
server can handle these as it chooses, but the default action is
for it to throw its hands in the air and sing "goodbye, cruel world,
it's over, walk on by".
support removal and addition of i/o file descriptors on the fly.
* detect closed file descriptors
* automatically free waiters of a dead file descriptor
* give the file server the possibility to specify a callback which
notifies of a dead file descriptor
* move loop function to be a property of the mainloop instead of
framebuf (doesn't change effective behaviour)
* add the possibility to configure a timespec parameter which
attempts to call the loop function periodically
* move the event loop functions from the puffs_framebuf namespace
to puffs_framev to differential between pure memory management
functions