To make code in 'external' (etc) still compile, MALLOC_DECLARE() still
has to generate something of type 'struct malloc_type *', with
normal optimisation gcc generates a compile-time 0.
MALLOC_DEFINE() and friends have no effect.
Fix one or two places where the code would no longer compile.
The M_xxx arg is left on the calls to malloc() and free(),
maybe they could be converted to an enumeration and just saved in
the malloc header (for deep diag use).
Remove the malloc_type from mbuf extension.
Fixes rump build as well.
Welcome to 6.99.6
1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.
2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.
3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.
4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.
5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.
ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)
This change appears to require a kernel version bump for rumpish
reasons.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)
releng@ acknowledged
an element of the SRCS list. This should fix a problem in which build
products were created in the source tree.
Also add a comment about where COMPAT_50 is defined.
implementation. Rewrite pseudodevice code to use cprng_strong(9).
The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.
The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.
Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.
For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.
allows registration of callbacks that can be used later for
cross-secmodel "safe" communication.
When a secmodel wishes to know a property maintained by another
secmodel, it has to submit a request to it so the other secmodel can
proceed to evaluating the request. This is done through the
secmodel_eval(9) call; example:
bool isroot;
error = secmodel_eval("org.netbsd.secmodel.suser", "is-root",
cred, &isroot);
if (error == 0 && !isroot)
result = KAUTH_RESULT_DENY;
This one asks the suser module if the credentials are assumed to be root
when evaluated by suser module. If the module is present, it will
respond. If absent, the call will return an error.
Args and command are arbitrarily defined; it's up to the secmodel(9) to
document what it expects.
Typical example is securelevel testing: when someone wants to know
whether securelevel is raised above a certain level or not, the caller
has to request this property to the secmodel_securelevel(9) module.
Given that securelevel module may be absent from system's context (thus
making access to the global "securelevel" variable impossible or
unsafe), this API can cope with this absence and return an error.
We are using secmodel_eval(9) to implement a secmodel_extensions(9)
module, which plugs with the bsd44, suser and securelevel secmodels
to provide the logic behind curtain, usermount and user_set_cpu_affinity
modes, without adding hooks to traditional secmodels. This solves a
real issue with the current secmodel(9) code, as usermount or
user_set_cpu_affinity are not really tied to secmodel_suser(9).
The secmodel_eval(9) is also used to restrict security.models settings
when securelevel is above 0, through the "is-securelevel-above"
evaluation:
- curtain can be enabled any time, but cannot be disabled if
securelevel is above 0.
- usermount/user_set_cpu_affinity can be disabled any time, but cannot
be enabled if securelevel is above 0.
Regarding sysctl(7) entries:
curtain and usermount are now found under security.models.extensions
tree. The security.curtain and vfs.generic.usermount are still
accessible for backwards compat.
Documentation is incoming, I am proof-reading my writings.
Written by elad@, reviewed and tested (anita test + interact for rights
tests) by me. ok elad@.
See also
http://mail-index.netbsd.org/tech-security/2011/11/29/msg000422.html
XXX might consider va0 mapping too.
XXX Having a secmodel(9) specific printf (like aprint_*) for reporting
secmodel(9) errors might be a good idea, but I am not sure on how
to design such a function right now.
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not). This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:
An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.
A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.
The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.
An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.
A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.
An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.
In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.
The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.
The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.
A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.
The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.
Manual pages for the new kernel interfaces are forthcoming.
ranges that include the least and the greatest vmem_addr_t. Update
vmem(9) uses throughout the kernel. Slightly expand on the tests in
subr_vmem.c, which still pass. I've been running a kernel with this
patch without any trouble.
address in a vmem(9) arena, 0) and VMEM_ADDR_MAX (the maximum possible
address, currently 0xFFFFFFFF). Modify several boundary conditions so
that a vmem(9) arena can allocate ranges including VMEM_ADDR_MAX.
Update documentation and tests.
These changes pass the tests in sys/kern/subr_vmem.c. To compile the
and run the test program, run "cd sys/kern/ && gcc -DVMEM_SANITY -o
subr_vmem ./subr_vmem.c && ./subr_vmem".
- Make sure to consume complete path components.
- Consume trailing slashes too.
- Do not clear REQUIREDIR.
Test rump/modautoload/t_modautoload now passes.
- On lookup it is ok to create if the name exists and is a whiteout
- When replacing a whiteout directory entry remove the whiteout first.
- Set UF_OPAQUE when creating a node in place of a whiteout.
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.
- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55
Adresses PR kern/38762 panic: vwakeup: neg numoutput
No objections from tech-kern@.
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.
Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.
Keep uvm_vnp_zerorange() until the next kernel version bump.
don't build the uvm_object.c uvm_object_printit() for _RUMPKERNEL. (XXX)
add empty panic() stubs for uvm_loanbreak() and ubc_purge().
fixes some more 5.99.53 rump build issues.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
Push -Wno-array-bounds down to the cases that depend on it.
Selectively disable warnings for 3rd party software or non-trivial
issues to be reviewed later to get clang -Werror to build most of the
tree.
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).
MSLT and VTW were contributed by Coyote Point Systems, Inc.
Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.
Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.
Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.
It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.
A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.
sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.
Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
yesterday the CARP test stopped working, since CARP depends on
IFF_PROMISC (which was previously always accidentally enabled).
While making the interface honor IFF_PROMISC, also make it compare
the received frame's address against ifp->if_sadl instead of a
local enaddr value we cached when the interface was created.
This fixes a rather curious forwarding/redirect/etc. storm which
happened when there were >2 shmif kernels on the same shmbus with
ip forwarding set on. (at least it stress-tested other code ;)
appears that using nxr to search for users wasn't a very good idea.
Put networking back and make the test of the defines give out
#errors.
me be fixink this
the exec handshake to return.
In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.
See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
return false. This makes rump lfs unmount work on platforms which
use the pmap stub (i.e. non-x86, which already returned false here).
Otherwise, lfs would hang itself trying to flush some buffers but
couldn't fill a segment and therefore wouldn't actually write
anything.
Fixes nfs{,ro}_fileio tests on at least sparc64 (and probably macppc
and other fat endian machines).
The problem was that nfs was fooled to thinking read() caused a
write fault because of VM_PROT_WRITE being unconditionally set and
therefore set NMODIFIED on a r/o file system. It is absolutely
beyond me why the test worked on i386/amd64. Incidentally, I seem
to have "misplaced" a few goats.
in tmpfs_node), so when playing with pages make sure we lock the
uvm object the pages belong to instead of the vnode's uvm object.
per test from Nicolas Joly (which I'm sure he will commit soon ;)
pageouts active and give up only if the pagedaemon could not free
memory and there are no outstanding pageouts.
This should fix the "out of memory" pauses reported by Mihai Chelaru
and Taylor R Campbell. Tested by copying files to and from an ffs
backed by /dev/wd0 (with and without -o log) using a 1MB rump kernel
memory limit.