The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
a bug: the homegrown version neglected to unlock vp
* don't reimplement eopnotsupp()
* init genfs_node earlier vget, protects against error paths in vget
from attempting to destroy a non-inited node
same file were renamed simultaneously, there was a window where
directory entry cached in the vnode during lookup would be replaced
before calling rename. This lead to one directory entry getting
renamed twice and the other one getting zero renames. Do a relookup
in rename to make sure we have the correct directory entry.
Thanks go to Greg Oster for reporting the problem, helping with
debugging and thoroughly testing the patch.
called with NOWAIT.
XXX: this is just a quick fix to stop the diagnostic panic. I
think ENOSPC should be treated elsewhere depending on how much
memory tmpfs claims.
thing and release locks before the userspace wait for operations
which release the lock before exit from the method in any case.
However, releasing the lock after inserting the request on the
operation queue gives us proper ordering possibilities in userspace
(at least if that bit were implemented, but I don't think there
any file system in userspace that depends on kernel locking and
probably there never should be one).
inspired by a conversation with Nacho Navarro
with the file server happen through puffs_msg_enqueue() and
puffs_msg_wait() instead of having a billion different routines.
Build the existing system upon these two. Most importantly though,
decouple insertation into the op queue from the actual wait. This
is useful for a number of reasons coming soon to a cvs repo near you.
unlocked vnode when trying to rename a directory. The fix was to
shuffle some bits around and #pray.
The rename routine actually needs a very very major wide-angle whopping:
* it takes locks out-of-order
* it deals with references from SAVESTART lookups in interesting ways
* I doubt there is any guarantee for correct operation if there
are multiple concurrent accesses
* the error branches might just as well call panic() directly
Rip the transport code completely out of puffs and generalize it
into an independent module which will be used for multiple purposes
in the future. This module is called the Pass-to-Userspace
Transporter (known as "putter" among friends).
This is very much work-in-progress and one dependency with puffs
remains: the request framing format.
The device name is still /dev/puffs, but that will change soon.
Users of puffs need the following in their kernel configs now:
pseudo-device putter
when vclean()ing. Pending an adventure to the genfs/vm labyrinth
to fix this properly, compensate here by not allowing unstrategic
(no pun) return values. They are always due to the userspace server
crashing anyway, so it's no big deal if we lie about the final
resting place of the pages.
userspace, since it doesn't contain any information yet. I should
still rework this more so this is just a quickie to get the read/write
style interface more up to speed with the ioctl version.
interacts with the userspace file server:
* since the kernel-user communication is not purely request-response
anymore (hasn't been since 2006), try to rename some "request" to
"message". more similar mangling will take place in the future.
* completely rework how messages are allocated. previously most of
them were borrowed from the stack (originally *all* of them),
but now always allocate dynamically. this makes the structure
of the code much cleaner. also makes it possible to fix a
locking order violation. it enables plenty of future enhancements.
* start generalizing the transport interface to be independent of puffs
* move transport interface to read/write instead of ioctl. the
old one had legacy design problems, and besides, ioctl's suck.
implement a very generic version for now; this will be
worked on later hopefully some day reaching "highly optimized".
* implement libpuffs support behind existing library request
interfaces. this will change eventually (I hate those interfaces)
userspace call, namely our private mount structure, in the activation
record. This avoids problems in situations where the userspace
file server happens to die during our upcall and the vnode is
forcibly reclaimed before we roll back to the current stack frame.
committed something, issue an abort. The abort is done through
the regular op channel, e.g. failed mkdir leads to regular rmdir,
inactive and reclaim. No internal interface is planned currently
for the one file system out of a million which would implement it
to benefit from the one case in a billion where kernel resource
allocation actually does fail and out of that one case in a trillion
where internal vs. external would make a difference.
to be vaild errno values
* include string describing error in PUFFS_ERR
* get rid of union in puffs_req, it's nothing but trouble
* pass pmp to async i/o callbacks
locked. Ideally the function should be rewritten to do things in
a different order, but this tries to keep changes minimal aiming
for a possible netbsd-4 pullup.
fixes PR kern/37034
kernel to the file server for silly things the file server did,
e.g. attempting to create a file with size VSIZENOTSET. The file
server can handle these as it chooses, but the default action is
for it to throw its hands in the air and sing "goodbye, cruel world,
it's over, walk on by".
was done separate of inserting the cookie into the lookup structure
and without any form of interlock. This could lead to the same
cookie pointing to two different nodes. Remedy the race by creating
a separate "checked and ready to be inserted" cookie list which
serves as an interlock without having to hold a fs-global creation
lock.
before copying them out, rather just use a single one. Further, follow
the example of tmpfs and others by simply allocating on the stack.
This should have the side-effect of silencing false Coverity reports like
CID 4559 and 4554.
VOP_LOOKUP ignores LOCKPARENT completely, so make this ignore it also.
XXX: tested only with rump, but I can't really see how this worked
at all before
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
* panic if flushing system nodes fails: we have released too many
resources and would die anyway the next time unmount is attempted
* KASSERT that VOP_CLOSE succeeds, but always return 0. once again
we have released too many resources
XXX: maybe rewrite to be a bit more robust
- Raise an error if renaming a file to a directory.
- Raise an error if renaming a directory to a file.
- Raise an error if renaming a directory to a non-empty directory.
- Properly allow renaming a directory to an empty directory.
The system could previously crash if the kernel had DIAGNOSTIC enabled,
as this triggered a bogus assertion.
Problem found by Geoff Wing.
it has supplied us). If we fault pages which are at offset >= server
size, but less than the in-kernel vnode size, inform the file server
of the latest developments in file size before issueing the fault.
The avoids confusion with files which are not written start to finish.
fixes kern/36429 by yamt
the file server inactive less over-eagerly executed and masks some
problems with the new mounting style. Effectively, it makes some
file systems such as psshfs mountable again (only without -o allops).
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
1) Comply with the way buffercache(9) is intended to be used. Now we
read in single blocks of EFS_BB_SIZE, never taking in variable
length extents with a single bread() call.
2) Handle symlinks with more than one extent. There's no reason for
this to ever happen, but it's handled now.
3) Finally, add a hint to our iteration initialiser so we can start
from the desired offset, rather than naively looping through from
the beginning each time. Since we can binary search the correct
location quickly, this improves large sequential reads by about
40% with 128MB files. Improvement should increase with file size.
When reading a file, we would erroneously iterate to the next extent
before having filled the entire uio request. This lead to unnecessary
extent iteration and excessive calls to efs_read.
Sequential read performance has doubled in the uncached case and
quadrupled when data is buffered.
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
go through VFS_ROOT() and allow to fetch it without locking it.
This allows us to call the cache flush operations also for the root
vnode and most notably fixes e.g. a "No such file or directory"
for a psshfs root directory ls -l when a file was locally deleted
and remotely re-created.
Also fix some sloppy programming in root node fetch (mostly cosmetic).
Actually, we do need separate "no references in file server" and
"noref + inactive" flags if we wish to correctly support unix open
file semantics and optimize away pre-reclaim cache flushes. So,
add PNODE_DYING which stands for norefs + inactive.
"noref + inactive" flags if we wish to correctly support unix open
file semantics and optimize away pre-reclaim cache flushes. So,
add PNODE_DYING which stands for norefs + inactive.
the kernel it has 0 references to the node in question. In other
words, this can be used to avoid inactive(), or, if the file server
does not implement inactive, prompt reclaim for removed nodes.
as we give a reference to userspace for the puffs_node for the
duration of the poll call. So reference count puffs_node separately
from the parent vnode. vref()/vrele() is not possible due to a possible
surprise visit from VOP_INACTIVE.