calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).
PID allocation is now MP-safe.
Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).
SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.
Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.
body of reaper(), right before the call to uvm_exit(). cpu_wait() must
be done before uvm_exit() because the resources it frees might be located
in the PCB.
remove simplelockrecurse, lockpausetime and PAUSE():
none of these serve any purpose anymore.
in the LOCKDEBUG functions, expand the splhigh() region to
cover the entire function. without this there can still be races.
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.
A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).
getnewvnode now checks this bit, and it if's set makes sure a vnode's not
locked before removing it from the free list.
Closes PR 7954 by Alan Barrett <apb@iafrica.com>.
Fix and document naming convention for vnode variables (always use
lvp/lvpp and uvp/uvpp instead of a hash of cvp, vpp, dvpp, pvp, pvpp).
Delete old stale #if 0'ed code at the end.
Change error path code in getcwd_getcache() slightly (merge common
cleanup code; shouldn't affect behavior any).
mp->mnt_flags & MNT_MWAIT is replaced by mp->mnt_wcnt, and a new mount
flag MNT_GONE is created (reusing the same bit).
In insmntque(), add DIAGNOSTIC check to fail if the filesystem vnode
is being moved to is in the process of being unmounted.
getnewvnode() now protects the list of vnodes active on mp with
vfs_busy()/vfs_unbusy().
To avoid generating spurious errors during a doomed unmount, change
the "wait for unmount to finish" protocol between dounmount() and
vfs_busy(). In vfs_busy(), instead of only sleeping once, sleep until
either MNT_UNMOUNT is clear or MNT_GONE is set; also, maintain a count
of waiters in mp->mnt_wcnt so that dounmount() knows when it's safe to
free mp.
tested by running a "while :; do mount /d1; umount -f /d1; done" loop
against multiple find(1) processes.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.
- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen
In my understanding no code here is subject to export control so it
should be safe.
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.
which use uvm_vslock() should now test the return value. If it's not
KERN_SUCCESS, wiring the pages failed, so the operation which is using
uvm_vslock() should error out.
XXX We currently just EFAULT a failed uvm_vslock(). We may want to do
more about translating error codes in the future.
the process (i.e. pre-Reno behavior). The 4.4BSD behavior (introduced
in Reno) caused transient errors to stick incorrectly.
This is from PR #7640 (Havard Eidnes), cross-checked w/ FreeBSD, where
Bill Fenner committed the same fix (as described in a comment in the
Vat sources, by Van Jacobsen).
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
vslocking here?! copyout() on its own seems to suffice just about everwhere
else, and it's not like the process is going to exit; it's in a system
call!
serious race condition in sosend(). Upon closer inspection, the appropriate
flags are checked within splsoftnet() for soreceive(), so no change needed
there. Also a little KNFing in sosend().
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.
This is required for clone(2).
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.
This is required for clone(2).
-a subregion start was ignored if all previous allocations were before
the subregion, reported by Lennart Augustsson in PR kern/7539
-an existing allocation which overlaps the beginning of the subregion
was ignored (ie overlapped) if is is not the last allocation
given buffer, and if necessary, reducing the display width of the
number to fit in the buffer by increasing the units (from kilobytes
(2^10) through to exabytes (2^60)).
count is 0, wait for use count to drain before finishing the close.
This is necessary in order for multiple processes to safely share file
descriptor tables.
* add arguments describing the vnode and ecoff header of the executable
being set up to the [onz]magic setup functions.
* export the stack setup function and the [onz]magic setup functions.
* call the MD ecoff hook _before_ the [onz]magic and stack setup
functions, and bail out early if the MD hook sets up vmcmds.
- Initialize mbpool and mclpool with msize and mclbytes, respectively,
so that those values may be patched and have an actual affect on the
next system reboot.
- Set low water marks on mbpool (default: 16) and mclpool (default: 8).
This should be of great help for diskless systems, which need to allocate
mbufs in order to clean dirty pages; the low water marks increase the
chances of this being possible to do in memory starvation situations.
- Add support for getting/setting some mbuf-related parameters via sysctl.
* msize and mclsize (read-only)
* nmbclusters (read-only unless the platform has direct-mapped pool pages,
in which case the value can be increased).
* mblowat and mcllowat (read/write)