some things (e.g. unionfs) may depend on it. It's currently ok
for vnodecovered to be set already; it's not for v_mountedhere in
the vnode, though.
From John Darrow.
XXX should probably just extend VFS_MOUNT to take the vnode pointer as
an argument.
since the lock may be taken again. This was the intention of the CANRECURSE
lock already there, but didn't work.
Only fill in the vnode<->mountpoint links (mountedhere and vnodecovered)
after VFS_MOUNT returned succesfully. It might happen that something called
from VFS_MOUNT mistook the vnode for an already successfully mounted on
one because of this.
could be done in one of 2 ways:
* call lk_init with LK_CANRECURSE, resulting in a lock that
always can be used recursively.
* call lockmgr with LK_CANRECURSE, meaning that it's ok if this
lock is already held by us.
Sometimes we need a locking type that says: take this lock now, exclusively,
but while I am holding it, I may go through a code path which could attempt
to get the lock again, and which is unaware that the lock might already
be taken. Implement LK_SETRECURSE for this purpose. Assume that locks and
unlocks come in matching pairs (they should), and check for this 'level'
using SETRECURSE locks.
to pass down a locked node. Modify union_copyup() to call VOP_CLOSE
locked nodes.
Also fix a bug in union_copyup() where a lock on the lower vnode would
only be released if VOP_OPEN didn't fail.
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
Also:
* Do the boundary check when creating a new region as well.
* If we crossed the boundary, don't just throw away the region; lop off the
beginning and see if we still fit.
SB16 is now fully functional on the Alpha.
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.
parents for children's p_estcpu. I think this problem has always been there.
It's particularly noticable with X because the server builds up non-trivial
CPU, and hence, non-trivial p_estcpu scheduler penalty. The repeatedly
forked children were always starting from scratch and receiving a scheduler
preference.
of processes:
- Don't initialize rlim_max to RLIM_INFINITY. The limits for those should
be maxfiles and maxproc respectively. Programs expect getrlimit to
return reasonable values, so that they can allocate structures (for
example jdk does this).
- Don't initialize rlim_cur to NOFILE and MAXUPRC respectively, but to
min(NOFILE, maxfiles) and min(MAXUPRC, maxproc) respectively.