Remove it in ddb/db_syncobj.h too.
New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.
Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.
- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).
- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).
XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h
Previously the usage pattern was:
sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l
As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).
However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.
As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.
- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.
On the hacky benchmarks I have, held over from the transition to 1:1
threading, this restores pthread_cond_signal() perf to radixtree/sleepq
levels, and semes much better than either with pthread_cond_broadcast() and
10 threads. It would be interesting to see what might be achieved with a
lockless lookup, which is within grasp now thanks to pid_table being used
for lookup.
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.
In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.
Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.
Nudged in this direction by ad@ and chs@.
identifier uniquely identifies an LWP across the entire system, and will
be used in future improvements in user-space synchronization primitives.
(Test disabled and libc stub not included intentionally so as to avoid
multiple libc version bumps.)
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.
- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.
- Group the locks in struct proc at the end of the struct in their own cache
line.
- Add some comments.
life without its self->pt_lid being filled in.
- Fix an error path in _lwp_create(). If the new LID can't be copied out,
then get rid of the new LWP (i.e. either succeed or fail, not both).
- Mark l_dopreempt and l_nopreempt volatile in struct lwp.
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.
Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.
Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.
Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.
Add a couple of compile-time asserts for assumptions in the code.
No functional change intended in existing ptrace(2) software.
All ATF ptrace(2) and ATF GDB tests pass.
This change improves reliability of the threading ptrace(2) code.
variable from rodata. The compound gets pushed on the stack, the padding
of the structure was therefore not initialized, and was getting leaked to
userland in sys___sigaltstack14().
there is no actual bug here, since the buffer is guaranteed to be NUL
terminated.
With KASAN we check the whole buffer to cover the "worst" case, and here
it triggered false positives because the buffer size was not filtered.
Stop disabling LWP create and exit events for PT_SYSCALL tracing.
PT_SYSCALL disabled EXEC reporting for legacy reasons, there is no need
to repeat it for LWP and CHLD events.
Pass full siginfo from trapsignal events (SEGV, BUS, ILL, TRAP, FPE).
This adds missing information about signals like fault address.
Set ps_lwp always.
Before passing siginfo to userland through p_sigctx.ps_info, make sure
that it was zeroed for unused bytes. LWP and CHLD events do not set si_addr
and si_trap, these pieces of information are passed for crashes (like
software breakpoint).
LLDB crash reporting works now correctly:
(lldb) r
Process 552 launched: '/tmp/a.out' (x86_64)
Process 552 stopped
* thread #1, stop reason = signal SIGSEGV: invalid address (fault address: 0x123456)
Add MI toplevel startlwp function.
Switch all userland LWPs to go through lwp_create using a shared
mi_startlwp() function between all MD ABIs.
Add debugger related event handling in mi_startlwp() and continue with
standard p->p_emul->e_startlwp at the end of this routine.
Use eventswitch() to notify the event of LWP exit in lwp_exit().
ATF ptrace(2) tests signal9 and signal10 now pass.
kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()
all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
lwp_create() has been acquired more arguments, there was missing the latest
one. Per analogiam with changes in the same commit to other source files,
go for &SS_INIT.
in a case of process exit. Necessary to re-flag all LWPs for exit, as their
state might have changed or new LWPs spawned.
Should fix PR/46168 and PR/46402.
assertion failure (and thus a crash in DIAGNOSTIC kernels). Independently
discovered by YAMAMOTO Takashi and Joel Sing.
To avoid this, introduce a cpu_mcontext_validate() function and move all
sanity checks from cpu_setmcontext() there. Also untangle the netbsd32
compat mess slightly and add a cpu_mcontext32_validate() cousin there.
Add an exhaustive atf test case, based partly on code from Joel Sing.
Should finally fix the remaining open part of PR kern/43903.
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.
Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).
Discussed on <tech-kern>, reviewed by <ad>.
This will be used to support TLS. The MD method must match the ELF TLS spec
for that CPU architecture (if there is a spec).
At this time it is only implemented for i386, where it means setting the
per-thread base address for %gs. Please implement this for your platform!