address truncated to 31 bits (required for 32-bit readdir compatibility,
e.g. linux32). Instead, assign 2^31 range using the following logic:
- The first half of the 2^31 is assigned incrementally (the fast path).
- When exceeded, use the second half of 2^31, but manage with vmem(9).
It will require 2 billion files per-directory to trigger vmem(9) usage.
Also, while here, add some fixes for tmpfs_unmount().
Should fix PR/47739, PR/47480, PR/46088 and PR/41068.
Thanks to wiz@ for stress testing.
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.
Welcome to 6.99.24
Discussed on tech-kern@ some time ago.
Reviewed by: David Holland <dholland@netbsd.org>
nfsv4 as well as new implementations of nfsv3 and nfsv2.
This import is from tonight's FreeBSD head and is unchanged from there
except for automated munging of rcsids, rearranging of paths, and an
autogenerated files.* file that might or might not be syntactically
valid. (I will check in the script that does this shortly.)
There is not the slightest chance this will configure yet, let alone
compile or run.
link anymore. In a corner case this leaf can be held by a process as a CWD. It
is guaranteed to be empty at this stage so we trunc it removing the only valid
FID, being the '..' entry.
Solves part of PR kern/47987
Solves tests/vfs/t_vnops udf_dir_rmdirdotdot
VOP_PUTPAGES() was never triggered resulting in far too much data in the UBC
that needed to be written out. This could result in instability on small
memory machines.
A type specifier of the form
enum identifier
without an enumerator list shall only appear after the type it
specifies is complete.
which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).
(ok elad@)
error and modify all callers to not brelse() on error.
Welcome to 6.99.16
PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
- Implement NGONE to fix caching issue described in PR kern/25070.
Mostly taken from FreeBSD r125637.
- Revert revision 1.70 of smbfs_vnops.c to fix setattr to opened
direcotry. In case of SMB_CAP_NT_SMBS, NOPEN is set after
smbfs_smb_ntcreatex() call. If NOPEN is set in front, it will
immediately return by condition at do_open label.
- In smbfs_close(), call smbfs_smb_close() and drop NOPEN bit in
the case of direcotry. Otherwise smbfs_rmdir() fails when the
directory was opened.
This uglifies the interface, because several operations need to be
passed the namei flags and cache_lookup also needs for the time being
to be passed cnp->cn_nameiop. Nonetheless, it's a net benefit.
The glop should be able to go away eventually but requires structural
cleanup elsewhere first.
This change requires a kernel bump.
- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.
- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.
- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.
This change requires a kernel bump.
Fixes a side issue mentioned in PR kern/46990.
I left this commented to preserve the old behaviour of tmpfs_rename,
but it is obviously broken to omit the cache purge, and I'm surprised
nobody had encountered any problems with it until now.
tmpfs_vnode_get drops all locks except possibly the reclaiming bit
lock to keep the tmpfs node from being reclaimed while we're still
interested in it. Consequently, it does not keep the directory's
existence invariant, so we must check that after tmpfs_vnode_get.
Fixes PR kern/46990. Tested by Wolfgang Stukenbrock.
parent, keeping them active, and allowing to lookup .. without sending
a request to the filesystem.
Enable the featuure for perfused, as this is how FUSE works.
be vfs_detach'ed by module autounload before puffs_vfsop_unmount() completes
and has freed ressource from the pools. By holding a reference on
puffs_vfsops from each mount, we ensure that no race can occur here.
Works around the crash in kern/46734
We introduced a slow queue for delayed reclaims, while the existing
queue for unmount, flush and exist has been renamed fast queue. Both
queues had timestamp for when an operation should be done, but it was
useless for the fast queue, which is always used to run an operation
ASAP. And the timestamp test had an error that turned ASAP into "at next
tick", but nobody what there to wake the thread at next tick, hence
the hang. The fix is to remove the useless and buggy timestamp test for
fast queue.
The normal kernel behavior is to retain inactive nodes in the freelist
until it runs out of vnodes. This has some merit for local filesystems,
where the cost of an allocation is about the same as the cost of a
lookup. But that situation is not true for distributed filesystems.
On the other hand, keeping inactive nodes for a long time hold memory
in the file server process, and when the kernel runs out of vnodes, it
produce reclaim avalanches that increase lattency for other operations.
We do not reclaim inactive vnodes immediatly either, as they may be
looked up again shortly. Instead we introduce a grace time and we
reclaim nodes that have been inactive beyond the grace time.
- Fix lookup/reclaim race condition.
The above improvement undercovered a race condition between lookup and
reclaim. If we reclaimed a vnode associated with a userland cookie while
a lookup returning that same cookiewas inprogress, then the kernel ends
up with a vnode associated with a cookie that has been reclaimed in
userland. Next operation on the cookie will crash (or at least confuse)
the filesystem.
We fix this by introducing a lookup count in kernel and userland. On
reclaim, the kernel sends the count, which enable userland to detect
situation where it initiated a lookup that is not completed in kernel.
In such a situation, the reclaim must be ignored, as the node is about
to be looked up again.
This prevent possible panic "panic: buf mem pool index 23" later in
vfs_bio.c:buf_mempoolidx().
(I'm not sure if it's okay for getdisksize() to assume that
partinfo taken from DIOCGPART is properly initialized
on all disk(9) devices or not)
See also:
http://mail-index.NetBSD.org/source-changes/2012/06/30/msg035298.html
(and rename it to sysvbfs_file_setsize()) because it's actually
part of vnode ops and bfs.c is also pulled by standalone bootloaders
which don't want vnode header mess.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
To make code in 'external' (etc) still compile, MALLOC_DECLARE() still
has to generate something of type 'struct malloc_type *', with
normal optimisation gcc generates a compile-time 0.
MALLOC_DEFINE() and friends have no effect.
Fix one or two places where the code would no longer compile.
instance when doing a fault-issued VOP_GETPAGES within VOP_WRITE, changing
size leads to panic: genfs_getpages: past eof.
-Handle ticks wrap around for vnode name andattribute timeout
may fail leading to a panic in bread().
Replace bread() with getblk() / VOP_STRATEGY() and return
an error if getblk() fails.
Fixes PR#46282: 6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread
This is an interim solution for easy pullup. The final solution
is be to change bread() to not return a buffer on error. As
we have to change all callers of bread() this will not qualify
for a pullup.
lookup, create, mknod, mkdir, symlink, getattr and setattr messages
have been extended so that attributes and their TTL can be provided
by the filesytem. lookup, create, mknod, mkdir, and symlink messages
are also extended so that the filesystem can provide name TTL.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)
releng@ acknowledged
a lock, ignore the node and continue. To allow the cleaning to succeed
the current threadmust make progress.
For a brief time the cache may contain more than one vnode referring to
a lower node.
Don't unlock the hash mutex if getnewvnode fails -- we don't hold it.
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:
An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.
A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.
The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.
An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.
A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.
An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.
In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.
The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.
The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.
A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.
The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.
Manual pages for the new kernel interfaces are forthcoming.
to force locked vnodes here. It should be impossible to come here
with a nil upper node.
Relock the directory vnode after copyup. A locked union node with an
unlocked upper vnode can no longer exist so make FIXUP() an assertion.
The change does not make consensus, since only pagedaemon should need it.
Other threads will tolerate sleeping, and problems here are only symptoms
that something is going wrong in memory management. The cause, not the
symptoms, need to be fixed.
RUMP-visible code. Instead of checking that updateproc (aka ioflush,
aka syncer) will not sleep in PUFFS code, I check for any kernel thread:
after all none of them are designed to hang awaiting for a remote filesystem
operation to complete.
a memory allocation, or a response from the filesystem.
This avoids deadlocks in the following situations:
1) when memory is low: ioflush waits the fileystem, the fielsystem waits
for memory
2) when the filesystem does not respond (e.g.: network outage ona
distributed filesystem)
This is required to avoid data corruption bugs, where a getattr slices
itself within a setattr operation, and sets the size to the stall value
it got from the filesystem. That value is smaller than the one set by
setattr, and the call to uvm_vnp_setsize() trigged a spurious truncate.
The result is a chunk of zeroed data in the file.
Such a situation can easily happen when the ioflush thread issue a
VOP_FSYNC/puffs_vnop_sync/flushvncache/dosetattrn while andother process
do a sys_stat/VOP_GETATTR/puffs_vnop_getattr.
This mutex on size operation can be removed the day we decide VOP_GETATTR
has to operated on a locked vnode, since the other operations that touch
size already require that.
- Enable VOP tmpfs_whiteout().
- Support ISWHITEOUT in tmpfs_alloc_file().
- Support DOWHITEOUT in tmpfs_remove() and tmpfs_rmdir().
- Make rmdir on a directory containing whiteouts working.
Should fix PR #35112 (tmpfs doesn't play well with unionfs).
Fixes PR kern/36681. tmpfs now survives dirconc, all our vfs/tmpfs
tests and rename races in atf, and a bunch of hand-written tests
that I'd commit if atf didn't find them highly indigestible.
ok dholland
- union_close() has to lock/unlock the lower vnode.
- union_fsync() has to call spec_fsync() for the union vnode.
- union_strategy() must allow writes to devices on the lower file system.
- union_bwrite() was completely missing.
as zero. Make it advertise one (no_trunc == true).
Names longer than NAME_MAX (255) will never pass namei() btw.
Fixes PR #43670 (msdosfs claims support for filenames longer than {NAME_MAX},
but fails)
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
This should fix cross-build problems, but I can't really test
that now, so I am not re-enabling the inclusion of v7fs support
in makefs.
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
Renaming a file of any non-directory type over another file of any
other non-directory type is OK -- they need not match as long as
neither is a directory, so loosen the kassert to reflect this.
XXX Need to write test cases for this.
ok dholland, rmind
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.
Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.
Keep uvm_vnp_zerorange() until the next kernel version bump.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
cycle (destruction part). Perform link counting in tmpfs_dir_attach()
and tmpfs_dir_detach(), instead of alloc/free and arbitrary places.
Fixes PR/44285, PR/44288, PR/44657 and likely PR/42484.
- Fix the race between the lookup and inode destruction. Fixes PR/43167
and its duplicates PR/40088, PR/40757.
- Improve tmpfs_rename() locking a little, fix kqueue event notifications
and also fix PR/43617. Add simplistic tmpfs_parentcheck_p(); to be
expanded and used for further rename() locking fixes.
- Cache directory entry "hint" in the tmpfs node, add tmpfs_dir_cached(),
and thus avoid unnecessary lookup in tmpfs_remove() and tmpfs_rmdir().
- Set correct _PC_FILESIZEBITS value in tmpfs_pathconf(). Fixes PR/43576.
- Few minor fixes.
already prevented). File systems are no longer responsible to check this.
Clean up and add asserts (note that dvp == vp cannot happen in vop_link).
OK dholland@
Maintain a tree of file handles, create nodes from msdosfs_vptofh() and keep
them until either the file gets unlinked or the file system gets unmounted.
Fixes the msdosfs part of PR #43745 (fhopen of an unlinked file causes problems
on multiple file systems)
fixing the return value of tmpfs_fhtovp() in the not-found case.
When vmlocking2 was merged to head (Jan 2008 !!) the inode numbering was
changed. Before inodes were numbered 2..tm_nodes_max-1 and after the
merge the numbers are derived from the nodes memory address.
Fixes PR #43605 (tmpfs file handles are broken)