to initiate self destruct, i.e. unmount(MNT_FORCE). This, however,
is a semi-controlled self-destruct, since all caches are flushed
before the (possibly) violent unmount takes place.
context. This fixes a long-standing but seldomly seen deadlock,
where the kernel was holding pages busy (due to e.g. readahead
request) while waiting for the server to respond, and the server
made a callback into the kernel asking to invalidate those pages.
... or, well, theoretically fixes, since I didn't have any reliable
way of repeating the deadlock and I think I saw it only twice.
* it depended on the biglock (in a very cruel way)
* it was attached to userspace transactions rather than logical
fs operations
(If someone wants to revisit it some day, most of the stuff can be
reused from cvs history)
a valid optimization, but that's long gone and once VOP_INACTIVE is
called and the file server says that the vnode is going to be recycled,
it really is going to be recycled extra references gained or not.
with the file server happen through puffs_msg_enqueue() and
puffs_msg_wait() instead of having a billion different routines.
Build the existing system upon these two. Most importantly though,
decouple insertation into the op queue from the actual wait. This
is useful for a number of reasons coming soon to a cvs repo near you.
Rip the transport code completely out of puffs and generalize it
into an independent module which will be used for multiple purposes
in the future. This module is called the Pass-to-Userspace
Transporter (known as "putter" among friends).
This is very much work-in-progress and one dependency with puffs
remains: the request framing format.
The device name is still /dev/puffs, but that will change soon.
Users of puffs need the following in their kernel configs now:
pseudo-device putter
interacts with the userspace file server:
* since the kernel-user communication is not purely request-response
anymore (hasn't been since 2006), try to rename some "request" to
"message". more similar mangling will take place in the future.
* completely rework how messages are allocated. previously most of
them were borrowed from the stack (originally *all* of them),
but now always allocate dynamically. this makes the structure
of the code much cleaner. also makes it possible to fix a
locking order violation. it enables plenty of future enhancements.
* start generalizing the transport interface to be independent of puffs
* move transport interface to read/write instead of ioctl. the
old one had legacy design problems, and besides, ioctl's suck.
implement a very generic version for now; this will be
worked on later hopefully some day reaching "highly optimized".
* implement libpuffs support behind existing library request
interfaces. this will change eventually (I hate those interfaces)
committed something, issue an abort. The abort is done through
the regular op channel, e.g. failed mkdir leads to regular rmdir,
inactive and reclaim. No internal interface is planned currently
for the one file system out of a million which would implement it
to benefit from the one case in a billion where kernel resource
allocation actually does fail and out of that one case in a trillion
where internal vs. external would make a difference.
to be vaild errno values
* include string describing error in PUFFS_ERR
* get rid of union in puffs_req, it's nothing but trouble
* pass pmp to async i/o callbacks
kernel to the file server for silly things the file server did,
e.g. attempting to create a file with size VSIZENOTSET. The file
server can handle these as it chooses, but the default action is
for it to throw its hands in the air and sing "goodbye, cruel world,
it's over, walk on by".
was done separate of inserting the cookie into the lookup structure
and without any form of interlock. This could lead to the same
cookie pointing to two different nodes. Remedy the race by creating
a separate "checked and ready to be inserted" cookie list which
serves as an interlock without having to hold a fs-global creation
lock.
it has supplied us). If we fault pages which are at offset >= server
size, but less than the in-kernel vnode size, inform the file server
of the latest developments in file size before issueing the fault.
The avoids confusion with files which are not written start to finish.
fixes kern/36429 by yamt
go through VFS_ROOT() and allow to fetch it without locking it.
This allows us to call the cache flush operations also for the root
vnode and most notably fixes e.g. a "No such file or directory"
for a psshfs root directory ls -l when a file was locally deleted
and remotely re-created.
Also fix some sloppy programming in root node fetch (mostly cosmetic).
Actually, we do need separate "no references in file server" and
"noref + inactive" flags if we wish to correctly support unix open
file semantics and optimize away pre-reclaim cache flushes. So,
add PNODE_DYING which stands for norefs + inactive.
the kernel it has 0 references to the node in question. In other
words, this can be used to avoid inactive(), or, if the file server
does not implement inactive, prompt reclaim for removed nodes.
as we give a reference to userspace for the puffs_node for the
duration of the poll call. So reference count puffs_node separately
from the parent vnode. vref()/vrele() is not possible due to a possible
surprise visit from VOP_INACTIVE.
and other information instead of always using VDIR. To make this
possible without races, require all root node information already
in puffs_mount() and nuke puffs_start2() and the associated start
operation completely.
requested/inspired by Tobias Nygren
for nodes upon return from the userspace. Currently it can be used
to indicate that the file server should be notified of "inactive"
in case the file server has opted to not receive inactive every
time the reference count for a vnode drops to zero. (inactive is
a common event, almost never requires any action and must be executed
sychronously, so it is wasteful).
While doing this, cleanup the release-relock nonsense from the
vntouser*() arguments. It was never enabled and the whole LOCKEDVP()
concept was very broken to begin with.
when unmounting the file system in case of a certain timing (and
possibly some other conditions), a thread would wait on a condition
variable, while another thread broadcast the cv and immediately
proceeded to destroy it. The result was a system frozen completely
solid shorly after the process waiting for the cv woke up. So
introduce reference counting to synchronize destruction of the
resources in unmount.
I was able to repeat the problem only on my laptop in some special
cases, so I do not know how common it was. Ironically, killing
the file server process violently instead of unmount() didn't have
this problem because it never entered the unmount path from two
directions.
again. This is useful until locking is further developed and basically
any deadlocks can be solved by killing appropriate processes.
Thanks especially to Tommi Kyntola and Antti Louko for sitting down
with me and discussing resource ownership and locking strategies
in implementing this.
from putop. even though there's only one user currently, makes code
more readable
* move "delta" to a standard parameter in vntouser and get rid of the
specialcase vntouser_delta
bunch of bugs.
* park structures are now always allocated from a pool instead of a
mixed stack/malloc allocation
* get rid of the whole adjbuf concept, always just alloc the maximal
amount of memory to satisfy a request
* little regression: don't allow interrupting wait from file system
to userspace; this had problems already before, but now the problems
really started to shine through. I'll try to make this work again
some day.
* fix bmap to return a sensible value in runp