- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.
Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).
Discussed on <tech-kern>, reviewed by <ad>.
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.
Quick OK by <ad>.
into modules. By and large this commit:
- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
we no longer need to guard against access from hardware interrupt handlers.
Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
(xxx_INIT to xxx_HEAD_INITIALIZER). Drop code which inits
non-auto (global or static) variables to 0 since that's
already implied by being non-auto. Init some static/global
cpu_simple_locks at compile time.
tech-kern:
- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)
The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.
by yamt@.
- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.
- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.
- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.