to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
break other implementations so lookup the physical number instead of
indexing it. Choosing random numbers here is legal according to the specs,
but not a logical choice and most likely done as a wierd kind of copy
protection.
Rogue implementation found to use this
*Microsoft CDIMAGE UDF
corresponding flags.
Revert softdep_trackbufs() to its state before vn_start_write() was added.
Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.
Welcome to 4.99.17
again. This is useful until locking is further developed and basically
any deadlocks can be solved by killing appropriate processes.
Thanks especially to Tommi Kyntola and Antti Louko for sitting down
with me and discussing resource ownership and locking strategies
in implementing this.
workaround for the problem analyzed more deeply in kern/30831. In
short, the problem is keeping the vnode on the mount point vnode
list during reclaim. If reclaim happens to sleep (as is a possibility
with smbfs due to calling vrele() and therefore possibly VOP_INACTIVE),
code going through the entire mountpoint vnode list will hit
half-reclaimed vnodes.
from putop. even though there's only one user currently, makes code
more readable
* move "delta" to a standard parameter in vntouser and get rid of the
specialcase vntouser_delta
bunch of bugs.
* park structures are now always allocated from a pool instead of a
mixed stack/malloc allocation
* get rid of the whole adjbuf concept, always just alloc the maximal
amount of memory to satisfy a request
* little regression: don't allow interrupting wait from file system
to userspace; this had problems already before, but now the problems
really started to shine through. I'll try to make this work again
some day.
* fix bmap to return a sensible value in runp
Remove offset argument, we should now find an HFS+ volume in any
of its standard places.
Based on work from and test image provided by Pelle Johansson.
kernel and flush it out all at once instead of continuous updating
* add support for delivering notifications to the file server about
when a page was written to (but disabled by default for now). the
file server can use this to request flushing or invalidating the
kernel page cache
(hfslib_open_volume) We are only interested in the catalgo and
extents header, so read the first 512 bytes, not the whole first
extent. Also makes mounting a lot faster.
possible to recover the system by just killing processes in case
a file server manages to recurse into itself either by fault of
file server implementation or by pilot error. The downside is that
the code is extremely hard to follow and practically screams out
for newlock2 (in addition to screaming "bug here"). The whole
PCATCH nonsense and induced megacomplexity can hopefully be avoided
in the future by tweaking other parts of the implementation.
positive values of errno and 0. Otherwise it can return internal values
such as EJUSTRETURN and screw things up.
thanks to Bill for reminding me to revisit this
Since 1.40, which introduced support for non-DEV_BSIZE media,
mounting msdos floppy returned ENOTTY.
This is because floppy driver does not support DIOCGPART or DIOCWEDGEINFO
ioctl.
Those ioctls should not be a requirement for mounting msdosfs.
This patch is made by Christian Biere.
VFS_ROOT() if the cookies match. Without this fix, if the root
vnode was reclaimed, doing lookups for dotdot from the root vnode
was possible. In practice this occured only through getcwd.
the user file server can't really keep up and just writing and writing
may result in kernel memory exhaustion. this lossage is also partially
due to the stupid way mtime + size info is handled currently, but that
should change soon (*knock knock* ;)
* score a few debug printfs
for putpages, as it assumes a vnode doesn't have any pages. For
mounts using the page cache this is simply not true. Rather,
prevent opening a regular file in write-mode. That way a vnode
can never have dirty pages which would need to be flushed (i.e.
written).
- don't use SAVESTART in calls to relookup() from unionfs,
just vref() the desired vnode when we need to.
- fix locking and refcounting in the unionfs EEXIST error cases.
- release any vnode locks before calling VFS_ROOT(), vfs_busy() is enough.
this allows us to simplify union_root() and fix PR 3006.
- union_lock() doesn't handle shared lock requests correctly,
so convert them to exclusive instead. fixes PR 34775.
- in relookup(), avoid reusing "dp" for different purposes,
the error handling wasn't right. (actually just get rid of dp.)
also, change relookup() to ignore LOCKLEAF and always return the
vnode locked since the callers already expect this.
Patch by Slava Semushin <slava.semushin@gmail.com>
Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
also to a place less panicy, namely fifo_fsync (because currently the
metadata information is update when the node is changed. This will
probably change soon, though).
servers. This is still pretty much on the level "if it breaks ...".
It should work for single-threaded servers which handle one operation
from start to finish in one go. Also, it does not yet totally
correctly synchronize metadata and data in some cases. So needless
to say, it needs improvement, but it is possible that will have to
wait for some lock revampage.
since DIOCGPART is going to fail. Unfortunately there is no way to get the
geometry information we need from the wedge; it would be nice for wedges
to support a geometry ioctl. The values we cannot retrieve are marked with
XXX.
- Add a lot more debugging.
completely different file system, we still might re-enter the same
puffs fs in case we execute something on the other file system,
which wants to get a new vnode and ends up recycling a puffs vnode
for the purpose. In this case the fs server will sleep in the
kernel until it itself handles the operation .... which of course
is a slightly unlikely event.
After analyzing the path from getcleanvnode() to the vnode cemetary,
identify that fsync and putpages (strategy) are the ones in danger
of striking a deadlock deal. Abuse the vnode flag VXLOCK to tell
them "this vnode is irreversably going to meet its maker, don't
care about user server return values" (failure is not acceptable
down the vgonel() path) and issue the respective operations as
Fire-And-Forget (FAF) operations. no wait -> no deadlock.
This of course is a "fix" skating on thin ice. A better, more
generic solution is already in sight, but will take more effort to
implement.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
fixes the case where a directory lookup is done in a directory has never
been visted/listed; the search optimalisation that searches the directory
from where it left behind the last time would never reach the initial
offset of zero since it would always have at least processed one entry.
of rolling around VOP_FSYNC(). The user server will be given the
VFS_SYNC instruction and it can do its own equivalent of VOP_FSYNC()
if it pleases, no need for the kernel to explicitly issue #{vnodes}
FSYNCs.
kernel caching. Currently supported are only flushing the name
cache for a directory or flushing the name cache for the entire fs.
Also, get rid of PNODE_INACTIVE status, since it was racy and
essentially didn't work. All this on top of being useless in the
first place ....
kernel->server calls at that time and it allows a window where
operations use an incorrect root node cookie.
XXX: there's still a (very much smaller and biglock safe) race, but
that's going to be solved by some more thorough restructuring
from under someone waiting for the fs server response in puffs_unmount()
if the descriptor was closed during the response wait (such as bug
leading to a crash in fs implementation unmount()).
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().