- Separate the suser part of the bsd44 secmodel into its own secmodel
and directory, pending even more cleanups. For revision history
purposes, the original location of the files was
src/sys/secmodel/bsd44/secmodel_bsd44_suser.c
src/sys/secmodel/bsd44/suser.h
- Add a man-page for secmodel_suser(9) and update the one for
secmodel_bsd44(9).
- Add a "secmodel" module class and use it. Userland program and
documentation updated.
- Manage secmodel count (nsecmodels) through the module framework.
This eliminates the need for secmodel_{,de}register() calls in
secmodel code.
- Prepare for secmodel modularization by adding relevant module bits.
The secmodels don't allow auto unload. The bsd44 secmodel depends
on the suser and securelevel secmodels. The overlay secmodel depends
on the bsd44 secmodel. As the module class is only cosmetic, and to
prevent ambiguity, the bsd44 and overlay secmodels are prefixed with
"secmodel_".
- Adapt the overlay secmodel to recent changes (mainly vnode scope).
- Stop using link-sets for the sysctl node(s) creation.
- Keep sysctl variables under nodes of their relevant secmodels. In
other words, don't create duplicates for the suser/securelevel
secmodels under the bsd44 secmodel, as the latter is merely used
for "grouping".
- For the suser and securelevel secmodels, "advertise presence" in
relevant sysctl nodes (sysctl.security.models.{suser,securelevel}).
- Get rid of the LKM preprocessor stuff.
- As secmodels are now modules, there's no need for an explicit call
to secmodel_start(); it's handled by the module framework. That
said, the module framework was adjusted to properly load secmodels
early during system startup.
- Adapt rump to changes: Instead of using empty stubs for securelevel,
simply use the suser secmodel. Also replace secmodel_start() with a
call to secmodel_suser_start().
- 5.99.20.
Testing was done on i386 ("release" build). Spearated module_init()
changes were tested on sparc and sparc64 as well by martin@ (thanks!).
Mailing list reference:
http://mail-index.netbsd.org/tech-kern/2009/09/25/msg006135.html
definition, drvctl_init() is not called, the drvctl_eventq is not
initialized, and the kernel will panic in devmon_insert() when a
device is detached.
Thanks to Jared McNeill for pointing out the panic.
Since VNODE_LOCKDEBUG has never been generally useful, default to
off. However, the checks can still be generated by flipping the
switch for the isolated cases where this form of dynamic analysis
is useful and the person using it knows what she is doing.
lookup_for_nfsd(). This code is, or at least should be, the same as
the regular symlink-following code plus an extra flag nfsd needs.
The two lots of code can/will be merged in the future.
private backdoor entry point this is.
Also, clone the lookup_for_nfsd() entry point as
lookup_for_nfsd_index(), for use by a different call site in nfsd that
does different unclean things with nameidata.
and make suspension by self, by drvctl(8), and by ACPI system sleep
play nice together. Start solidifying some temporary API changes.
1. Extract a new header file, <sys/device_if.h>, from <sys/device.h> and
#include it from <sys/pmf.h> instead of <sys/device.h> to break the
circular dependency between <sys/device.h> and <sys/pmf.h>.
2. Introduce pmf_qual_t, an aggregate of qualifications on a PMF
suspend/resume call. Start to replace instances of PMF_FN_PROTO,
PMF_FN_ARGS, et cetera, with a pmf_qual_t.
3. Introduce the notion of a "suspensor," an entity that holds a
device in suspension. More than one suspensor may hold a device
at once. A device stays suspended as long as at least one
suspensor holds it. A device resumes when the last suspensor
releases it.
Currently, the kernel defines three suspensors,
3a the system-suspensor: for system suspension, initiated
by 'sysctl -w machdep.sleep_state=3', by lid closure, by
power-button press, et cetera,
3b the drvctl-suspensor: for device suspension by /dev/drvctl
ioctl, e.g., drvctl -S sip0.
3c the system self-suspensor: for device drivers that suspend
themselves and their children. Several drivers for network
interfaces put the network device to sleep while it is not
administratively up, that is, after the kernel calls if_stop(,
1). The self-suspensor should not be used directly. See
the description of suspensor delegates, below.
A suspensor can have one or more "delegates". A suspensor can
release devices that its delegates hold suspended. Right now,
only the system self-suspensor has delegates. For each device
that a self-suspending driver attaches, it creates the device's
self-suspensor, a delegate of the system self-suspensor.
Suspensors stop a system-wide suspend/resume cycle from waking
devices that the operator put to sleep with drvctl before the cycle.
They also help self-suspension to work more simply, safely, and in
accord with expectations.
4. Add the notion of device activation level, devact_level_t,
and a routine for checking the current activation level,
device_activation(). Current activation levels are DEVACT_LEVEL_BUS,
DEVACT_LEVEL_DRIVER, and DEVACT_LEVEL_CLASS, which respectively
indicate that the device's bus is active, that the bus and device are
active, and that the bus, device, and the functions of the device's
class (network, audio) are active.
Suspend/resume calls can be qualified with a devact_level_t.
The power-management framework treats a devact_level_t that
qualifies a device suspension as the device's current activation
level; it only runs hooks to reduce the activation level from
the presumed current level to the fully suspended state. The
framework treats a devact_level_t qualifying device resumption
as the target activation level; it only runs hooks to raise the
activation level to the target.
5. Use pmf_qual_t, devact_level_t, and self-suspensors in several
drivers.
6. Temporarily add an unused power-management workqueue that I will
remove or replace, soon.
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).
tested by booting a kernel in qemu and compile-testing i386/ALL
In the for(;;) loop of turnstile_block(), the lock owner can change while
cur's lock is released (cur's lock is also the tschain_t's mutex).
Remove the KASSERT about owner being invariant and try to deal with the
fact that the owner can change instead.
http://mail-index.netbsd.org/tech-kern/2009/08/24/msg005957.html
and followups.
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.
In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.
Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.
In support of the changes above,
1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().
2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.
3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
since they are only peripherially related to the autoconf subsystem
and more related to boot initialization. Also, apply _KERNEL_OPT
to autoconf where necessary.
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html
sysctl_name[0], so write that instead of sysctl_name[sz] (where sz
just happened to be set to 0 in the previous line).
Also in sysctl_create(), give the length of the sysctl_name its
own variable, nsz, and reserve sz for expressing the size of the
node's value.
No functional change intended.
a simple attempt instead of an authoritative answer. The failure of the
rootspec test could me machine-dependant. Thanks to martin@ for pointing
that out.
iterator and the length of the old child array, but introduce a
new variable, 'olen', for the latter purpose.
In sysctl_alloc(), name a constant.
Introduce sysctl_log_print(), a handy debug routine.
No functional changes intended.
yet in service, have a correct pointer to their parent, sysctl_parent.
This fixes a bug where sysctl_teardown(9) could not clean up a
network interface's sysctl(9) trees when I detached it, because
the wrong log had been recorded.
support -ve lengths (lock area before current offset).
Nothing in libc or the kernel allowed for this, so some random part
of the file would get locked (no idea which bits).
Although this could probably be fixed in libc, the stubs for posix file
locks for emulations could easily get into the kernel with -ve lengths.
So fixing in the kernel avoids those problems.
This also fixes PR/41620 (attempting to lock negative offsets) - which
is what I was looking into!
and vtime > 0. It should be allowed to go to sleep for the sleep interval
indicated in vtime. Reported by der Mouse a long while ago, and this is what
other unixes do.
Factor out common code of chroot-like syscalls into change_root() and export
that function for use in other parts of the kernel.
Rename change_dir() to chdir_lookup() as the latter describes better what
the function does. While there, move the namei_data initialisation into
chdir_lookup(), too. And export chdir_lookup().
mq_prio_max is dynamic, and sorted list is used for custom setup, when
user manually sets higher priority range.
- Cache mq->mq_attrib in some places. Change msg_ptr type to uint8_t.
- Update copyright, misc.
taking a reference to curlwp's by calling fd_hold(). If lwp_create()
is called from fork1(), then l2 != curlwp, but l2's and not curlwp's
filedesc_t whose reference we should take.
This change stops the problem I describe in
<http://mail-index.netbsd.org/tech-kern/2009/07/09/msg005422.html>,
where /dev/rsd0a is never properly closed after fsck / runs on it.
This change seems to quiet my USB backup drive, sd0 at scsibus0 at
umass0, which had stopped spinning down when it was not in use:
The unit probably stayed open after mount(8) tried (and failed:
errant fstab entry) to mount it.
I am confident that this change is an improvement, but I doubt that
it is the last word on the matter. I hate to get under the filedesc_t
abstraction by fiddling with fd_refcnt, and there may be something
I have missed, so somebody with greater understanding of the file
descriptors code should have a look.
must not allocate a pmf_event_workitem_t using kmem_alloc(9). Use
pool_cache(9), instead, because it is safe in interrupt context.
Thanks, rmind@, for catching the problem and suggesting the solution.
latter might lose its KAUTH_GENERIC_ISSUSER check soon, add an internal
function, mqueue_access(), and call genfs_can_access() from it instead
so we don't pollute the main code path once we need to add a special
kauth(9) check for message queues.
No functional change, error codes preserved.
Related mailing list thread:
http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr
All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.
XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
case functionality of namei in a simple package with only a couple flags.
A substantial majority of the namei call sites in the kernel can use
this interface; this will isolate those areas from the changes arising
as the internals of namei are fumigated.
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:
- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.
Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
(void *)pew is one way to get a struct work *, but let's
write&pew->pew_work, instead. It is more defensive and persuasive.
Make miscellaneous changes in support of tearing down arbitrary
stacks of filesystems and devices during shutdown:
1 Move struct shutdown_state, shutdown_first(), and shutdown_next(),
from kern_pmf.c to subr_autoconf.c. Rename detach_all() to
config_detach_all(), and move it from kern_pmf.c to subr_autoconf.c.
Export all of those routines.
2 In pmf_system_shutdown(), do not suspend user process scheduling, and
do not detach all devices: I am going to do that in cpu_reboot(),
instead. (Soon I will do it in an MI cpu_reboot() routine.) Do still
call PMF shutdown hooks.
3 In config_detach(), add a DIAGNOSTIC assertion: if we're exiting
config_detach() at the bottom, alldevs_nwrite had better not be 0,
because config_detach() is a writer of the device list.
4 In deviter_release(), check to see if we're iterating the device list
for reading, *first*, and if so, decrease the number of readers. Used
to be that if we happened to be reading during shutdown, we ran the
shutdown branch. Thus the number of writers reached 0, the number
of readers remained > 0, and no writer could iterate again. Under
certain circumstances that would cause a hang during shutdown.
filesystem is mounted. Synchronize access to the number with a
mutex. When a struct mount, mp, is allocated, assign the current
generation number to mp->mnt_gen. Introduce vfs_unmount_forceone()
that forcefully unmounts the most recently mounted filesystem.
Refactor: extract vfs_shutdown1() from vfs_shutdown(). Extract
vfs_sync_all() from vfs_shutdown1().
Print more progress indications while we're unmounting all of the
filesystems during shutdown.
We increase the reference count on mp before calling dounmount(mp),
but we do not decrease it if dounmount(mp) fails, and neither does
dounmount(mp). So decrease the reference count if dounmount(mp)
fails.
Change the loop terminating condition in vfs_unmountall1() to (mp
!= (void *)&mountlist) from !CIRCLEQ_EMPTY(&mountlist), because we
may not ever empty the list, especially if we're not forcing the
filesystems to unmount.
the other routines of the same spirit.
Adjust file-system code to use it.
Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).
No objections on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
by making the pps count time stamp and the update
time stamp u_int64.
The time delta between two PPS events can now
be correctly calculated avoiding any unaccounted
for wraps with 32-bit counters.
When I originally wrote this, I was going for maximum flexibility.
However, after a private discussion with dholland@, I see how this
will cause problems with the future world order of namei whenever
that might be. At the moment, I don't need the extra flexibility,
but if something comes up this may have to be revisited.
mq_send() should fail due to permissions. Noted by Stathis Kamperis!
- Check for empty message queue name (POSIX does not allow this for regular
files, and it's weird), check for DTYPE_MQUEUE, fix permission check in
mq_unlink(), clean up.
these commits move all path handling into module_do_load() from
kobj_load_file(). This way the final path used to load a module
is available for loading <module>.plist, which will store parameters
for a module. The end goal of this project is good support for
MODULAR device drivers.
- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.
Some measurements with libmicro:
- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
- Fix a preemption bug in CURCPU_IDLE_P() that can lead to a bogus
assertion failure on DEBUG kernels.
- Fix MP/preemption races with timecounter detachment.
- Fix a preemption bug in CURCPU_IDLE_P() that can lead to a bogus
assertion failure on DEBUG kernels.
- Fix MP/preemption races with timecounter detachment.
this fixes the following deadlock.
a thread doing getcleanvnode:
pick a vnode
acqure v_interlock
v_usecount++
call vclean
now, another thread doing cache_lookup:
picks the vnode
vtryget succeed
vn_lock succeed
now in vclean:
set VI_XLOCK (too late to be noticed by the competing thread)
wait on the vnode lock (this might violate locking order)
the use of a flag bit was suggested by Andrew Doran. PR/41374.
Without it, on ports where splhigh() is inline, the compiler will optimise
the second SOFTINT_PENDING test in softint_schedule(). A dissasembly
of softint_schedule() with and without the volatile sh_flags confirm this
on sparc.
Because of this there is a race that could lead to the softhand_t
being enqueued twice on si_q, leading to a corrupted queue and
some handler being SOFTINT_PENDING but never called.
Should fix PR kern/38637
way they can be included without having to include DDB.
(arguably all print routines should be behind #ifdef DEBUGPRINT
and options DDB should define that macro, but I'll tackle that later)
a new struct mount-allocation routine, vfs_mountalloc(9). Documentation
updates will follow.
Attention: Synchronization Oversight Committee! In mount_domount(),
I postpone the call mutex_enter(&mp->mnt_updating) until right before
the VFS_MOUNT(9) call because (1) that looks to me like the earliest
possible opportunity for mp to become visible to any other LWP, because
it was just kmem_zalloc(9)'d and (2) it made extracting the common code
much easier. Tell me if my reasoning is faulty.
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.
Quick OK by <ad>.
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It's possible that this is the
reason for pkgbuild's longstanding crashiness.
For rwlocks, always enable the explicit membars. They were disabled only
on x86, and since they are not in the fast-path it's not a big deal.
TODO: convert these to an atomic_membar_foo() or similar that does ordering
between regular data references and atomic references.
- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.
- Redo /dev/cpu time_t compat so no kernel changes are needed.
x86:
- Make intr_establish, intr_disestablish safe to use when !cold.
- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.
- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
over and over to detach all of the devices. Stop when we cannot detach
even a single device in a cycle. Call shutdown hooks on all of the
devices that remain attached.
This is another step toward the detach/unmount cycle that will help us
tear down arbitrary stacks of filesystems, ccd(4), raid(4), and vnd(4).
filesystem at all, false otherwise. This will support tearing down
stacks of filesystems, ccd(4), raid(4), and vnd(4).
Change the misleading variable name 'allerror' to 'any_error'. Make it
a bool.
don't forget to set m_len to 0. Otherwise whatever will compute the size
of this chain (including s_split() itself if called again on this chain)
will get it wrong, leading to various issues.
Bug exposed by the NFS server code with linux clients using TCP mounts.
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
provided is "too large" (log10(2^64) = 19).
(It can still overflow if the input value is close to 2^64 but I don't
consider this a problem.)
fixes nonsense displayed as "total memory" on boot
Call the detach routine for every device in the device tree, starting
with the leaves and moving toward the root, expecting that each
(pseudo-)device driver will use the opportunity to gracefully commit
outstandings transactions to the underlying (pseudo-)device and to
relinquish control of the hardware to the system BIOS.
Detaching devices is not suitable for every shutdown: in an emergency,
or if the system state is inconsistent, we should resort to a fast,
simple shutdown that uses only the pmf(9) shutdown hooks and the
(deprecated) shutdownhooks. For now, if the flag RB_NOSYNC is set in
boothowto, opt for the fast, simple shutdown.
Add a device flag, DVF_DETACH_SHUTDOWN, that indicates by its presence
that it is safe to detach a device during shutdown. Introduce macros
CFATTACH_DECL3() and CFATTACH_DECL3_NEW() for creating autoconf
attachments with default device flags. Add DVF_DETACH_SHUTDOWN
to configuration attachments for atabus(4), atw(4) at cardbus(4),
cardbus(4), cardslot(4), com(4) at isa(4), elanpar(4), elanpex(4),
elansc(4), gpio(4), npx(4) at isa(4), nsphyter(4), pci(4), pcib(4),
pcmcia(4), ppb(4), sip(4), wd(4), and wdc(4) at isa(4).
Add a device-detachment "reason" flag, DETACH_SHUTDOWN, that tells the
autoconf code and a device driver that the reason for detachment is
system shutdown.
Add a sysctl, kern.detachall, that tells the system to try to detach
every device at shutdown, regardless of any device's DVF_DETACH_SHUTDOWN
flag. The default for kern.detachall is 0. SET IT TO 1, PLEASE, TO
HELP TEST AND DEBUG DEVICE DETACHMENT AT SHUTDOWN.
This is a work in progress. In future work, I aim to treat
pseudo-devices more thoroughly, and to gracefully tear down a stack of
(pseudo-)disk drivers and filesystems, including cgd(4), vnd(4), and
raid(4) instances at shutdown.
Also commit some changes that are not easily untangled from the rest:
(1) begin to simplify device_t locking: rename struct pmf_private to
device_lock, and incorporate device_lock into struct device.
(2) #include <sys/device.h> in sys/pmf.h in order to get some
definitions that it needs. Stop unnecessarily #including <sys/device.h>
in sys/arch/x86/include/pic.h to keep the amd64, xen, and i386 releases
building.
This will be used to support TLS. The MD method must match the ELF TLS spec
for that CPU architecture (if there is a spec).
At this time it is only implemented for i386, where it means setting the
per-thread base address for %gs. Please implement this for your platform!
signals (i.e. SA_KILL), just if SIGKILL (or SIGCONT). Improve comments.
Make some functions static, remove unused sigrealloc() prototype.
Fixes PR/39814. Similar patch reviewed by <ad>.
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
There are also sigtimedwait(2) et al. to catch signals without invoking
a signal handler. Fixes PR kern/41076 by Matteo Beccati (the first
test case, where the signal is sent before sigwaitinfo(2) gets called).
Fix numerous problems:
1. LDT updates are not atomic.
2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.
3. LDTR can be leaked over context switch.
4. GDT slot allocations can race, giving the same LDT slot to two procs.
5. Incomplete interrupt/trap frames can be stacked.
6. In some rare cases segment faults are not handled correctly.