- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
Otherwise the child will incorrectly see it is not running on any
CPU. Among other things, this fixes crashes from having
LD_PRELOAD=libpthread.so set in the env.
reviewed by tech-kern
that scheduler locks are special in this regard - adaptive locks cannot
be in the path due to turnstiles. Randomly spotted/reported by uebayasi@.
- Remove unused lwp_relock() and replace lwp_lock_retry() by simplifying
lwp_lock() and sleepq_enter() a little.
- Give alllwp its own cache-line and mark lwp_cache pointer as read-mostly.
OK ad@
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().
COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.
Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).
Fixes PR/43176.
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.
Quick OK by <ad>.
for now. This will prevent signals from waking them. Adjust
exit_lwps() to explicitly add LW_SINTR to all of them, so that
the process exit code can wake them up.
This is needed as threads in both of these wait channels die once
they are woken. So they aren't interruptable in the typical sense.
I am now able to suspend & resume firefox successfully now.
one of the following:
- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.
Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.
we no longer need to guard against access from hardware interrupt handlers.
Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
by yamt@.
- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.
- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.
- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.
Seems to be quite stable. Some work still left to do.
Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.
Reviewed by: <tech-kern>, <ad>
- Better detect simple cycles of threads calling _lwp_wait and return
EDEADLK. Does not handle deeper cycles like t1 -> t2 -> t3 -> t1.
- If there are multiple threads in _lwp_wait, then make sure that
targeted waits take precedence over waits for any LWP to exit.
- When checking for deadlock, also count the number of zombies currently
in the process as potentially reapable. Whenever a zombie is murdered,
kick all waiters to make them check again for deadlock.
- Add more comments.
Also, while here:
- LOCK_ASSERT -> KASSERT in some places
- lwp_free: change boolean arguments to type 'bool'.
- proc_free: let lwp_free spin waiting for the last LWP to exit, there's
no reason to do it here.
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.
Fixes kern/35582 (kernel panics with gdb).
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.
Members of the thread group must die without reporting to the parent and
without going to zombie stage. We do that by reparenting to init before
catching a SIGKILL. The parent will not see the child death.
The thread group leader must report the exit status, even if it exits
because of another thread calling exit_group(). We do that by storing the
exit status in struct linux_emuldata_shared, and the exit hook has the
duty of setting struct proc's p_xstat for the thread group leader.
2) For exit/fork/exec hooks, move the NPTL specific code to separate functions
that are shared between COMPAT_LINUX and COMPAT_LINUX32
3) Fix LINUX_CLONE_PARENT_SETTID semantics
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.
while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.