Add support for tracing vfork(2) events in the context of ptrace(2).
This API covers other frontends to fork1(9) like posix_spawn(2) or clone(2),
if they cause parent to wait for exec(2) or exit(2) of the child.
Changes:
- Add new argument to sigswitch() determining whether we need to acquire
the proc_lock or whether it's already held.
- Refactor fork1(9) for fork(2) and vfork(2)-like events.
Call sigswitch() from fork(1) for forking or vforking parent, instead of
emitting kpsignal(9). We need to emit the signal and suspend the parent,
returning to user and relock proc_lock.
- Add missing prototype for proc_stop_done() in kern_sig.c.
- Make sigswitch a public function accessible from other kernel code
including <sys/signalvar.h>.
- Remove an entry about unimplemented PTRACE_VFORK in the ptrace(2) man page.
- Permin PTRACE_VFORK in the ptrace(2) frontend for userland.
- Remove expected failure for unimplemented PTRACE_VFORK tests in the ATF
ptrace(2) test-suite.
- Relax signal routing constraints under a debugger for a vfork(2)ed child.
This intended to protect from signaling a parent of a vfork(2)ed child that
called PT_TRACE_ME, but wrongly misrouted other signals in vfork(2)
use-cases.
Add XXX comments about still existing problems and future enhancements:
- correct vfork(2) + PT_TRACE_ME handling.
- fork1(2) handling of scenarios when a process is collected in valid but
rare cases.
All ATF ptrace(2) fork[1-8] and vfork[1-8] tests pass.
Fix PR kern/51630 by Kamil Rytarowski (myself).
Sponsored by <The NetBSD Foundation>
This means that the full executable path is always available.
- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change
TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?
PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.
PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.
Sponsored by <The NetBSD Foundation>
The SIGTRAP signal is thrown from the kernel if EVENT_MASK (ptrace_event)
enables PTRACE_FORK. This new si_code helps debuggers to distinguish the
exact source of signal delivered for a debugger.
Another purpose of TRAP_CHLD is to retain the same behavior inside the
NetBSD kernel for process child traps and have an interface to monitor it.
Retrieving exact event and extended properties of process child trap is
available with PT_GET_PROCESS_STATE.
There is no behavior change for existing software.
This si_code value is NetBSD extension.
Sponsored by <The NetBSD Foundation>
This removes dead code introduced with the following commit:
date: 2012-07-27 22:52:49 +0200; author: christos; state: Exp; lines: +8 -2;
revert racy vfork() parent-blocking-before-child-execs-or-exits code.
ok rmind
to relock the lock you unlocked. Otherwise the lock you unlocked won't
walk the walk, not by a long chalk, and you'll end up getting mocked.
From Mateusz Guzik of FreeBSD via freenode.
XXX: pullup-6 and -7
can named the same as those on other platforms.
For example, proc:::exec-success, not proc:::exec_success.
Implementation follows the same basic principle as FreeBSD's; add
another field to the SDT_PROBE_DEFINE macro which is the name
as exposed to userland.
1. When we vfork() PL_PPWAIT is set, and that makes us do regular disposition
of the TRAP signal, and not indirect through the debugger.
2. The parent needs to keep running, so that the debugger can release it.
Unfortunately, with vfork() we cannot release the parent because it will
eventually core-dump since the parent and the child cannot run on the
same address space.
- Simplify message queue descriptor unlinking and closure operations.
- Update proc_t::p_mqueue_cnt atomically. Inherit it on fork().
- Use separate allocation for the name of message queue.
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.
no functional change
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.
Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).
Discussed on <tech-kern>, reviewed by <ad>.
we no longer need to guard against access from hardware interrupt handlers.
Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
instead of doing it in syscall_intern().
Note that syscall_intern() must still be called when the flags change
since many ports use a different copy of the syscall entry code when
tracing is enabled.
Been running in my tree for over a month at least.
Reviewed and okay yamt@, and special thanks to him as well as rittera@
for making this possible through fixing NDIS to not call fork1() with
l1 != curlwp.