Add schedctl(8) - a program to control scheduling of processes and threads.
Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;
Proposed on: <tech-kern>. Reviewed by: <ad>.
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.
(xxx_INIT to xxx_HEAD_INITIALIZER). Drop code which inits
non-auto (global or static) variables to 0 since that's
already implied by being non-auto. Init some static/global
cpu_simple_locks at compile time.
tech-kern:
- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.
The following lines in the kernel config enables the SCHED_M2:
no options SCHED_4BSD
options SCHED_M2
The scheduler seems to be stable. Further work will come soon.
http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.htmlhttp://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!
by yamt@.
- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.
- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.
- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
- Better detect simple cycles of threads calling _lwp_wait and return
EDEADLK. Does not handle deeper cycles like t1 -> t2 -> t3 -> t1.
- If there are multiple threads in _lwp_wait, then make sure that
targeted waits take precedence over waits for any LWP to exit.
- When checking for deadlock, also count the number of zombies currently
in the process as potentially reapable. Whenever a zombie is murdered,
kick all waiters to make them check again for deadlock.
- Add more comments.
Also, while here:
- LOCK_ASSERT -> KASSERT in some places
- lwp_free: change boolean arguments to type 'bool'.
- proc_free: let lwp_free spin waiting for the last LWP to exit, there's
no reason to do it here.
with newlock2 merge:
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.
Add _lwp_getspecific_by_lwp() to get lwp specific data from other lwp's.
Protected by #ifdef _LWP_API_PRIVATE.
Approved by: Jason Thorpe <thorpej@netbsd.org>
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.