* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.
Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.
UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.
this is the rest of the MI portion changes.
this will be KNF'd shortly. :-)
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.