* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
is running (and NTP is not enabled), the adjtime()-handling code clobbers
any tickfix that may be necessary for systems with clocks with frequency
greater than 1000Hz.
and the "kernel.tar.Z" distribution on louie.udel.edu, which is older than
xntp 3.4y or 3.5a, but contains newer kernel source fragments.
This commit adds support for a new kernel configuration option, NTP.
If NTP is selected, then the system clock should be run at "HZ", which
must be defined at compile time to be one value from:
60, 64, 100, 128, 256, 512, 1024.
Powers of 2 are ideal; 60 and 100 are supported but are marginally less
accurate.
If NTP is not configured, there should be no change in behavior relative
to pre-NTP kernels.
These changes have been tested extensively with xntpd 3.4y on a decstation;
almost identical kernel mods work on an i386. No pulse-per-second (PPS)
line discipline support is included, due to unavailability of hardware
to test it.
With this in-kernel PLL support for NetBSD, both xntp 3.4y and xntp
3.5a user-level code need minor changes. xntp's prototype for
syscall() is correct for FreeBSD, but not for NetBSD.
(1) text calculation incorrect (would 'overbill')
(2) data calculation incorrect (would 'overbill')
(3) the maxrss calculation uses stuff which isn't present
on the sparc.
if 3/4 tests are questionable and/or broken, well...
* cleaned up hardclock() to avoid checking "p" multiple times, and avoid a
gcc2 possible-use-before-initialisation warning.
* changed softclock() timeout callback functions to be of type timeout_t -
a pointer to a void fn(int). No-one was using the second, tick, argument
that was being passed to these callbacks - it is much cleaner to drop the
thing entirely, rather than add a whole heap of casts of dubious
correctness to calls to timeout(), etc. The old style is kept in an
#ifdef, for future reference.