Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.
Members of the thread group must die without reporting to the parent and
without going to zombie stage. We do that by reparenting to init before
catching a SIGKILL. The parent will not see the child death.
The thread group leader must report the exit status, even if it exits
because of another thread calling exit_group(). We do that by storing the
exit status in struct linux_emuldata_shared, and the exit hook has the
duty of setting struct proc's p_xstat for the thread group leader.
2) For exit/fork/exec hooks, move the NPTL specific code to separate functions
that are shared between COMPAT_LINUX and COMPAT_LINUX32
3) Fix LINUX_CLONE_PARENT_SETTID semantics
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.
while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP
re-used by another cpu immediately. in that case, lwp_exit2() will
access freed memory. to fix this:
- remove curlwp from p_lwps in exit1() rather than letting lwp_exit2() do so.
- add assertions to ensure freed proc has no lwps.
kern/24329 from me and kern/24574 from Havard Eidnes.
the latter is not a appropriate place to do so and it broke vfork.
- deactivate pmap before calling cpu_exit() to keep a balance of
pmap_activate/deactivate.
process context ('reaper').
From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit
uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.
MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.
g/c now unneeded routines and variables, including the reaper kernel thread
an offset between ss_sp and struct sa_stackinfo_t (located in struct
__pthread_st) when calling sa_register. The kernel increments the
sast_gen counter in struct sastack when an upcall stack is used.
libpthread increments the sasi_stackgen counter in struct
sa_stackinfo_t when an upcall stack is freed. The kernel compares the
two counters to decide if a stack is free or in use.
- add struct sa_stackinfo_t with sasi_stackgen to count stack use in
userland
- add sast_gen to struct sastack to count stack use in kernel
- add SA_FLAG_STACKINFO to enable the stackinfo_offset argument in the
sa_register syscall
- add sa_stackinfo_offset to struct sadata for offset between ss_sp
and struct sa_stackinfo_t
- add ssize_t stackinfo_offset argument to sa_register, initialize
struct sadata's sa_stackinfo_offset from it if SA_FLAG_STACKINFO is
set
- add sa_getstack, sa_getstack0, sa_stackused and sa_setstackfree
functions to find/use/free upcall stacks and use these where
appropriate
- don't record stack for upcall in sa_upcall0
- pass sau to sa_switchcall instead of l2 (l2 = curlwp in sa_switchcall)
- add sa_vp_blocker to struct sadata to pass recently blocked lwp to
sa_switchcall
- delay finding a stack for blocked upcalls to sa_switchcall
- add sa_stacknext to struct sadata pointing to next most likely free
upcall stack; also g/c sa_stackslist in struct sadata and sast_list
in struct sastack
- add L_SA_WOKEN flag: LWP is on sa_woken queue
- add L_SA_RECYCLE flag: LWP should be recycled in sa_setwoken
- replace l_upcallstack with L_SA_WOKEN/L_SA_RECYCLE/L_SA_BLOCKING
flags
- g/c now unused sast_blocker in struct sastack
- make sa_switchcall, sa_upcall0 and sa_upcall_getstate static in
kern_sa.c
- call sa_upcall_userret only once in userret
- split sa_makeupcalls out of sa_upcall_userret and use to process
the sa_upcalls queue
- on process exit: mark LWPs sleeping in saunblock interruptible; also
there are no LWPs sleeping on l->l_upcallstack anymore; also clear
sa_wokenq_head to prevent unblocked upcalls
additional changes:
- cleanup timerupcall sa_vp == curlwp check
- add check in sa_yield if we didn't block on our way here and we
wouldn't any longer be the LWP on the VP
- invalidate sa_vp_ofaultaddr after resolving pagefault
- use splay tree for the pagefault check if the thread was running on
an upcall stack.
=> removes the limitation that all upcall stacks need to be
adjoining and that all upcall stacks have to be loaded with the
1st sys_sa_stacks call.
=> enables keeping information associated with a stack in the kernel
which makes it simpler to find out which LWP is using a stack.
=> allows increasing the SA_MAXNUMSTACKS without having to
allocate an array of that size.
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.
Welcome to 1.6ZF
combined. Also prepare for adding VP repossession later.
- kern_sa.c: sa_yield/sa_switch: detect if there are pending unblocked
upcalls.
- kern_sa.c: sa_unblock_userret/sa_setwoken: queue LWPs about to invoke
an unblocked upcall on the sa_wokenq. put queued LWPs in a state where
they can be put in the cache. notify LWP on the VP about pending
upcalls.
- kern_sa.c: sa_upcall_userret: check sa_wokenq for pending upcalls,
generate unblocked upcalls with multiple event sas
- kern_sa.c: sa_vp_repossess/sa_vp_donate: g/c, restore original
sa_vp_repossess
- prevent BLOCKED upcalls on double page faults and during upcalls
- make libpthread handle blocked threads which hold locks
- prevent UNBLOCKED upcalls from overtaking their BLOCKED upcall
this adds a new syscall sa_unblockyield
see also http://mail-index.netbsd.org/tech-kern/2003/09/15/0020.html
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.
recorded by p_nlwps) *or* if the process was a SA process. Since
cached SA LWPs aren't counted in p_nlwps, it was possible for
them to not be cleaned up and remain on the alllwp list, pointing to a
dead proc.
need to reparent the process to initproc, so that child wouldn't
have its p_pptr pointer still pointing on the exited parent
pointed out by Dave Sainty in private mail (the patch in kern/14443
didn't have this bug)
reparented back to original parent before it's killed.
This makes the original parent aware that the child has exited if
the debugger failed to wait() on the debugged zombie before exiting.
Since we clear tracing flags before killing the child, the reparenting
logic in wait4() wouldn't be triggered, so it's necessary to do it here.
Problem reported and fix provided in kern/14443 by David Sainty.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.
Reviewed by Christos Zoulas.
space is already torn down in uvmspace_free() when the vmspace
refrence count reaches 0. Move the shmexit() call into uvmspace_free().
Note that there is a beneficial side-effect of deferring the unmap
to uvmspace_free() -- on systems where TLB invalidations are
particularly expensive, the unmapping of the address space won't
have to cause TLB invalidations; uvmspace_free() is going to be
run in a context other than the exiting process's, so the "pmap is
active" test will evaluate to FALSE in the pmap module.
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx
This addresses kern/10981 by Matthew Orgass.
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different
This was discussed on tech-kern.
<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.
The old timeout()/untimeout() API has been removed from the kernel.
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).
PID allocation is now MP-safe.
Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).
SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.
Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.
body of reaper(), right before the call to uvm_exit(). cpu_wait() must
be done before uvm_exit() because the resources it frees might be located
in the PCB.
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.
A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.
This is required for clone(2).