Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.
This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.
No functional change.
directory. Treating it as DOT lookup would put garbage into the name
cache and could panic on future lookups.
Seen with ZFS file system exported from OmniOS, an OpenSolaris derivative.
Fixes PR kern/50664 "cd .." over NFS/ZFS can panic kernel
For many reasons, forcibly unmounting a soft NFS mount could hang forever.
Here are the fixes:
- Introduce decents timeouts in operation that awaited NFS server reply.
- On timeout, fails operations on soft mounts with EIO.
- Introduce NFSMNT_DISMNTFORCE to let the filesystem know that a
force unmount is ongoing. This causes timeouts to be reduced and
prevents the NFS client to attempt reconnecting to the NFS server.
Also fix a race condition where some asynchronous I/O could reference
destroyed mount structures. We fix this by awaiting asynchronous I/O
to drain before proceeding.
Reviewed by Chuck Silvers.
functions that do all the ugly work, are just plain copyin/out for the
native system calls, and do the necessary translations for netbsd32.
with this i'm able to run 32 bit nfsd and mountd on 64 bit kernel and
mount the file systems remotely.
nam parameter type from buf * to sockaddr *.
final commit for parameter type changes to protocol user requests
* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.
Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html
The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.
Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
separate functions
xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)
- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()
rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs
- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()
patch reviewed by rmind
find.
The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.
The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.
The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.
Bump for struct protosw. Welcome to 6.99.62!
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.
- Pass down LK_RETRY to the lock operation (hint for deadfs only).
- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.
- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.
With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.
Adresses PR kern/37706 (Forced unmount of file systems is unsafe)
Discussed on tech-kern.
Welcome to 6.99.33
structures used by both the nfs server and client code. Tested by pgoyette@
1. mount remote fs via nfs (my /home directory), which autoloads nfs module
2. manually modload nfsserver
3. wait a bit
4. manually modunload nfsserver
5. wait a couple minutes
6. verify that client access still works (/bin/ls ~paul home dir)
7. manually modload nfsserver again
8. start an nfsd process
9. wait a bit
10. kill nfsd process
11. wait
12. manually modunload nfsserver again
13. verify continued client access
XXX: Note that nfs_vfs_init() calls nfs_init(), but nfs_vfs_done() does not
call nfs_fini(). Also note that the destruction order is wrong in it,
but probably does not matter. "someone" (!= me) should fix it :-) and
run the above tests.
delayed write errors may get lost.
Change nfs_vinvalbuf() to keep errors from vinvalbuf() for fsync() or close().
Presented on tech-kern@
Fix for PR kern/47980 (NFS over-quota not detected if utimes() called
before fsync()/close())
A type specifier of the form
enum identifier
without an enumerator list shall only appear after the type it
specifies is complete.
which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).
(ok elad@)
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)
Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.
No functional change intended.
for openat() and friends) to ni_atdir to avoid confusion with a
previously existing (and, alas, still documented) ni_startdir field
that meant something else entirely.
This uglifies the interface, because several operations need to be
passed the namei flags and cache_lookup also needs for the time being
to be passed cnp->cn_nameiop. Nonetheless, it's a net benefit.
The glop should be able to go away eventually but requires structural
cleanup elsewhere first.
This change requires a kernel bump.
- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.
- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.
- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.
This change requires a kernel bump.
pool gets destroyed. Otherwise we are left with a stray pool that points to
unmapped memory behind (and bad things happen). Typically you get seemingly
random page faults (without printing uvm_fault) that happen in various pool
operations. Most frequent one is the pool_drain() from the page daemon.
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not). This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:
An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.
A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.
The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.
An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.
A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.
An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.
In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.
The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.
The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.
A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.
The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.
Manual pages for the new kernel interfaces are forthcoming.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
already prevented). File systems are no longer responsible to check this.
Clean up and add asserts (note that dvp == vp cannot happen in vop_link).
OK dholland@
so nfsd no longer needed to do a lookup() call immediately afterwards
to retrieve the newly created object.
Since that change there has been no way for ISSYMLINK to be set upon
return from VOP_MKNOD (or before the call to VOP_MKNOD either) so
remove the test for it and associated block of dead code.
(I do not understand how this code was reachable before then either.
The logic in question is only reached if no object by that name
existed, and there's no reasonable way that a successful call to
VOP_MKNOD should ever create a symlink. The code appears to come from
4.4lite; maybe they had locking bugs?)
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.
sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.
Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
to 3 times max RPC size rather than 2 times. Avoids nasty TCP stalls observed
at Panix. Will require increase to sbmax via sysctl for those running really
huge NFS rsize/wsize (>64K).
parent dir) associated with SAVESTART in relookup().
Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.
The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.
Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)