"earliest" firing callout in a bucket. This allows us to skip
the scan up the bucket if no callouts are due in the bucket.
A cheap O(1) hint update is done at callout insertion (if new callout
is earlier than hint) and removal (is bucket empty). A thorough
refresh of the hint is done when the bucket is traversed.
This doesn't matter much on machines with small values of hz
(e.g. i386), but on systems with large values of hz (e.g. Alpha),
it has a definite positive effect.
Also, keep the callwheel stats in evcnts, so that you can view them
with "vmstat -e".
defined, call addupc_intr() directly from statclock() in the system time case,
using the same P_OWEUPC path if the copyin/copyout fails.
Use this in i386 to remove profiling code from the normal userret() path.
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.
Stops sleeps from returning early (by up to a clock tick), and return 0
ticks for timeouts that should happen now or in the past.
Returning 0 is different from the legacy hzto() interface, and callers
need to check for it.
"KERN_SYSVIPC_SEM_INFO" and "KERN_SYSVIPC_SHM_INFO" to return the
info and data structures for the relevent SysV IPC types. The return
structures use fixed-size types and should be compat32 safe. All
user-visible changes are protected with
#if !defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)
Make all variable declarations extern in msg.h, sem.h and shm.h and
add relevent variable declarations to sysv_*.c and remove unneeded
header files from those .c files.
Make compat14 SysV IPC conversion functions and sysctl_file() static.
Change the data pointer to "void *" in sysctl_clockrate(),
sysctl_ntptime(), sysctl_file() and sysctl_doeproc().
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.
The old timeout()/untimeout() API has been removed from the kernel.
approximation of reality if the MD code doesn't. This variable is the
equivalent of "tickfix" for the non-NTP path.
This allows an alpha kernel (where hz=1024) with "options NTP" to
synch up quite nicely (as opposed to having an frequency error of
~560ppm, which is outside the capture range of the PLL).
that is priority is rasied. Add a new spllowersoftclock() to provide the
atomic drop-to-softclock semantics that the old splsoftclock() provided,
and update calls accordingly.
This fixes a problem with using the "rnd" pseudo-device from within
interrupt context to extract random data (e.g. from within the softnet
interrupt) where doing so would incorrectly unblock interrupts (causing
all sorts of lossage).
XXX 4 platforms do not have priority-raising capability: newsmips, sparc,
XXX sparc64, and VAX. This platforms still have this bug until their
XXX spl*() functions are fixed.
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
is running (and NTP is not enabled), the adjtime()-handling code clobbers
any tickfix that may be necessary for systems with clocks with frequency
greater than 1000Hz.