Add schedctl(8) - a program to control scheduling of processes and threads.
Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;
Proposed on: <tech-kern>. Reviewed by: <ad>.
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.
tech-kern:
- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;
This makes cpuctl(8) work with SCHED_M2.
OK by <ad>.
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.
The following lines in the kernel config enables the SCHED_M2:
no options SCHED_4BSD
options SCHED_M2
The scheduler seems to be stable. Further work will come soon.
http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.htmlhttp://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.
- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
with newlock2 merge:
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.