RUMP-visible code. Instead of checking that updateproc (aka ioflush,
aka syncer) will not sleep in PUFFS code, I check for any kernel thread:
after all none of them are designed to hang awaiting for a remote filesystem
operation to complete.
a memory allocation, or a response from the filesystem.
This avoids deadlocks in the following situations:
1) when memory is low: ioflush waits the fileystem, the fielsystem waits
for memory
2) when the filesystem does not respond (e.g.: network outage ona
distributed filesystem)
This is required to avoid data corruption bugs, where a getattr slices
itself within a setattr operation, and sets the size to the stall value
it got from the filesystem. That value is smaller than the one set by
setattr, and the call to uvm_vnp_setsize() trigged a spurious truncate.
The result is a chunk of zeroed data in the file.
Such a situation can easily happen when the ioflush thread issue a
VOP_FSYNC/puffs_vnop_sync/flushvncache/dosetattrn while andother process
do a sys_stat/VOP_GETATTR/puffs_vnop_getattr.
This mutex on size operation can be removed the day we decide VOP_GETATTR
has to operated on a locked vnode, since the other operations that touch
size already require that.
- Enable VOP tmpfs_whiteout().
- Support ISWHITEOUT in tmpfs_alloc_file().
- Support DOWHITEOUT in tmpfs_remove() and tmpfs_rmdir().
- Make rmdir on a directory containing whiteouts working.
Should fix PR #35112 (tmpfs doesn't play well with unionfs).
Fixes PR kern/36681. tmpfs now survives dirconc, all our vfs/tmpfs
tests and rename races in atf, and a bunch of hand-written tests
that I'd commit if atf didn't find them highly indigestible.
ok dholland
- union_close() has to lock/unlock the lower vnode.
- union_fsync() has to call spec_fsync() for the union vnode.
- union_strategy() must allow writes to devices on the lower file system.
- union_bwrite() was completely missing.
as zero. Make it advertise one (no_trunc == true).
Names longer than NAME_MAX (255) will never pass namei() btw.
Fixes PR #43670 (msdosfs claims support for filenames longer than {NAME_MAX},
but fails)
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
This should fix cross-build problems, but I can't really test
that now, so I am not re-enabling the inclusion of v7fs support
in makefs.
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
Renaming a file of any non-directory type over another file of any
other non-directory type is OK -- they need not match as long as
neither is a directory, so loosen the kassert to reflect this.
XXX Need to write test cases for this.
ok dholland, rmind
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.
Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.
Keep uvm_vnp_zerorange() until the next kernel version bump.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
cycle (destruction part). Perform link counting in tmpfs_dir_attach()
and tmpfs_dir_detach(), instead of alloc/free and arbitrary places.
Fixes PR/44285, PR/44288, PR/44657 and likely PR/42484.
- Fix the race between the lookup and inode destruction. Fixes PR/43167
and its duplicates PR/40088, PR/40757.
- Improve tmpfs_rename() locking a little, fix kqueue event notifications
and also fix PR/43617. Add simplistic tmpfs_parentcheck_p(); to be
expanded and used for further rename() locking fixes.
- Cache directory entry "hint" in the tmpfs node, add tmpfs_dir_cached(),
and thus avoid unnecessary lookup in tmpfs_remove() and tmpfs_rmdir().
- Set correct _PC_FILESIZEBITS value in tmpfs_pathconf(). Fixes PR/43576.
- Few minor fixes.
already prevented). File systems are no longer responsible to check this.
Clean up and add asserts (note that dvp == vp cannot happen in vop_link).
OK dholland@
Maintain a tree of file handles, create nodes from msdosfs_vptofh() and keep
them until either the file gets unlinked or the file system gets unmounted.
Fixes the msdosfs part of PR #43745 (fhopen of an unlinked file causes problems
on multiple file systems)
fixing the return value of tmpfs_fhtovp() in the not-found case.
When vmlocking2 was merged to head (Jan 2008 !!) the inode numbering was
changed. Before inodes were numbered 2..tm_nodes_max-1 and after the
merge the numbers are derived from the nodes memory address.
Fixes PR #43605 (tmpfs file handles are broken)
and after all io but before actually updating the cluster chain.
Both uvm_vnp_zerorange() and vtruncbuf() call get/putpages -> bmap -> pcbmap
and here the fat cache gets updated with information no longer valid after
truncation.
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.
See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
one routine called from here (unix2dosfn) expects and uses all of
a [12].
This may fix the "stack size exceeded" problem which has been
triggering in gson's test runs. (i'm not entirely sure why it
doesn't trigger in anyone else's env)
implementations that use `ramdom' numbers for the physical partitions breaking
lots of implementations. Known curlpit is MicroSoft Windows 7.
Not only the partition mappings need to be protected against this but also the metadata partition files.
* fix copying of the extents of the metadata node to the metadatamirror node;
it was not copying all extents.
* fix truncing metadata partition:
* fix endian conversions
* fix information length calculation so its truncated to the right length!
* allow for setting maximum extent length in extent merging. This is needed
since extents in the metadata partition files are only to be in allocation
unit sizes.
* adjust grow and shrink node to set the granularity of the maximum length of
an extent when encountering a metadatafile or metadatamirror file.
missing at least opaque directory support, but until someone figures
out how that should work on ffs (see PR kern/kern/44383), there's
no point in trying to figure out how it should work here.
erronously entered as thelatin1 file name in the dirhash whereas the matching
routing assumes both UTF-8. This would result in a file being created but not
stat-able since the dirhash couldn't find the entry unless it was remounted.
parent dir) associated with SAVESTART in relookup().
Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.
The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.
Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
VAT writeout was done locked while marked locked as were the readin and
writeout of the metadata partition space table.
While here, also protect the (vp) argument of the UDF_SET_SYSTEMFILE() macro.
Tested on UDF 1.50 sequential, UDF 2.01 RW and UDF 2.50 metadata RW meda.
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)
This removes the need for the SAVENAME and HASBUF namei flags.
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.
Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).
The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
Interrupt server wait only on certain signals (same set at nfs -i)
instead of all signals. According to the PR this helps with
"git clone" run on a puffs file system.
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.
XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..
1-3 address the PR/43488 by Jeremy Huddleston.
Passes RB-tree regression tests.
Reviewed by: matt@, christos@
Release the hash list lock before calling getnewvnode() and check the
hash list again like other file systems do.
Take v_interlock before calling vget().
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().
Welcome to 5.99.34.
Discussed on tech-kern.
- remove unnessisary check that would prevent it from mounting newer nilfs
images. A field has been added in the segment summary.
- store blocks of files on their virtual block number
- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.
- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.
Discussed on tech-kern.
accounting. Use wired memory (which can be limited) for meta-data, and
kmem(9) for string allocations.
Close PR/31944. Fix PR/38361 while here. OK ad@.
This makes the size the same on 64bit archs. Don't bother bumping
any version, since you'd have explicitly had to jump through some
hoops to use pathconf before.
to do two things:
1) properly set "recycle?" in inactive
2) easily check if we are renaming a removed vnode. without the
check, it was possible to enter a dirent in the file system for
a removed (and hence scheduled to be vcleaned) vnode. this would
lead to the succesful vget() of a clean vnode. the use of the
cleaned vnode was, however, less succesful, except for purposes
of crashing.
the value before the call (yea, changing relookup would probably
be smart, but other file systems already initialize vpp, so I'm
letting someone else experiment with tylenol od).
system drivers where it was missing from and fixes one buggy
implementation. The arguably weird semantics of the check are
maintained (v_size vs. va_bytes, overwrite).
unlike event it did fail, the kernel would double lutz to doom
(in failure devvp now remains unmountable until reboot. fans
of complicated & untested error branches may attempt to gunk this
up. i'm not one of them).
* cosmetic surgery: cut extra ;
msdosfs and cd9660 are the only filesystems that verify the filesystem
type in the label. This is the wrong place, sanity checks should only
rely on the inner structure of the filesystem (like signatures or
magic numbers).
msdosfs also used the device type information from the label to
deduce a filesystem parameter heuristically for the gemdos variant.
If there is no information inside the filesystem data itself, this
should be an explicit mount option.
operations. This prevents kernel memory leaks (one of which happened
every time the file system was unmounted via PUFFSOP_UNMOUNT ...
and incidentally would've been trivially caught with the old
malloc(9) interface. I wonder if the message is to use a ton of
pools instead of regression-attractive kmem interface).
reference the puffs_node before sending the request to the file
server. This diminishes the window where the inode can be reclaimed
and be invalidated before it is accessed (but does not completely
eliminate the race, as that is a caller problem which we cannot
fix here).
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.
no functional change
to initiate self destruct, i.e. unmount(MNT_FORCE). This, however,
is a semi-controlled self-destruct, since all caches are flushed
before the (possibly) violent unmount takes place.
context. This fixes a long-standing but seldomly seen deadlock,
where the kernel was holding pages busy (due to e.g. readahead
request) while waiting for the server to respond, and the server
made a callback into the kernel asking to invalidate those pages.
... or, well, theoretically fixes, since I didn't have any reliable
way of repeating the deadlock and I think I saw it only twice.