Commit Graph

88 Commits

Author SHA1 Message Date
riastradh b557f9979d kern_lwp.c: Sort includes. No functional change intended. 2023-10-15 10:29:24 +00:00
riastradh fac91bbe0f sys/lwp.h: Nix sys/syncobj.h dependency.
Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
2023-10-15 10:27:11 +00:00
ad 32a89764db Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block().  Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.
2023-10-08 13:23:05 +00:00
ad 0a6ca13bec Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.
2023-10-04 20:29:18 +00:00
ad 6ed72b5fad - Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
  syncobj_t.  Then, do not hang onto the priority boost until userret(),
  drop it as soon as the LWP is out of the run queue and onto a CPU.
  Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
  simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
  like lwp_lock() which turn out not to be small after all (I don't know
  why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
  beyond what volatile does).
2023-09-23 18:48:04 +00:00
riastradh f485358332 kern: New struct syncobj::sobj_name member for diagnostics.
XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h
2023-07-17 12:54:29 +00:00
riastradh 7baa9e8e90 sleepq(9): Pass syncobj through to sleepq_block.
Previously the usage pattern was:

sleepq_enter(sq, l, lock);              // locks l
...
sleepq_enqueue(sq, ..., sobj, ...);     // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...);			// unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l.  At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj.  If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.)  But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
2022-06-29 22:27:01 +00:00
ad 0eaaa024ea Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
2020-05-23 23:42:41 +00:00
ad 20180cb18f - Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion.  The basic idea is that when
  doing an LWP lookup there is usually already a lock held (p->p_lock), or a
  spin mutex that needs to be taken (l->l_mutex), and either can be used to
  get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
  by ID"), and lookup by ID in proc0 doesn't really happen.  In-tree the new
  state should be understood by top(1), the tty subsystem and so on, and
  would attract the attention of 3rd party kernel grovellers in time, so
  remove it and just rely on LSIDL.
2020-05-23 20:45:10 +00:00
ad 3235a637ea lwp_unpark(): no need to acquire LWP refs or drop the proc lock.
On the hacky benchmarks I have, held over from the transition to 1:1
threading, this restores pthread_cond_signal() perf to radixtree/sleepq
levels, and semes much better than either with pthread_cond_broadcast() and
10 threads.  It would be interesting to see what might be achieved with a
lockless lookup, which is within grasp now thanks to pid_table being used
for lookup.
2020-05-05 22:12:06 +00:00
thorpej 156895706e Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs.  The
lead LWP of a process gets the PID as its LID.  If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.
2020-04-24 03:22:06 +00:00
thorpej 44fb992d10 Remove _lwp_gettid(2) system call. This problem is going to be solved
another way.  (Note: this call was never exposed in libc, so we can just
recycle the syscall number.)
2020-04-22 21:22:21 +00:00
ad 46a9878a41 Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).
2020-04-19 20:35:29 +00:00
thorpej 98a9cebbb6 Add support for lazily generating a "global thread ID" for a LWP. This
identifier uniquely identifies an LWP across the entire system, and will
be used in future improvements in user-space synchronization primitives.

(Test disabled and libc stub not included intentionally so as to avoid
multiple libc version bumps.)
2020-04-04 20:20:12 +00:00
ad c36937211e Update comments 2020-01-30 12:36:38 +00:00
ad d1c42b4f7b - Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case.  Replace scans of p->p_lwps with lookups in the
  tree.  Find free LIDs for new LWPs in the tree.  Replace the hashed sleep
  queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
  return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
  line.

- Add some comments.
2020-01-29 15:47:51 +00:00
ad 6c994951d8 Correction to previous: don't leak newuc if copyout() fails. 2020-01-26 19:08:09 +00:00
ad edf01486dd - Fix a race between the kernel and libpthread, where a new thread can start
life without its self->pt_lid being filled in.

- Fix an error path in _lwp_create().  If the new LID can't be copied out,
  then get rid of the new LWP (i.e. either succeed or fail, not both).

- Mark l_dopreempt and l_nopreempt volatile in struct lwp.
2020-01-25 15:41:52 +00:00
ad 11ba4e1830 Minor scheduler cleanup:
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
  sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
2019-11-23 19:42:52 +00:00
kamil 5e4bbc4985 Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo
Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.
2019-09-30 21:13:33 +00:00
maxv 3583c449f2 Fix info leak: instead of using SS_INIT as a literal compound, use a global
variable from rodata. The compound gets pushed on the stack, the padding
of the structure was therefore not initialized, and was getting leaked to
userland in sys___sigaltstack14().
2019-07-10 17:52:22 +00:00
maxv 413b53f543 Restrict the size given to copyoutstr. It is safer to do that; even if
there is no actual bug here, since the buffer is guaranteed to be NUL
terminated.

With KASAN we check the whole buffer to cover the "worst" case, and here
it triggered false positives because the buffer size was not filtered.
2019-07-01 17:15:43 +00:00
kamil efd4138069 Register KTR events for debugger related signals
Register signals for:

 - crashes (FPE, SEGV, FPE, ILL, BUS)
 - LWP events
 - CHLD (FORK/VFORK/VFORK_DONE) events -- temporarily disabled
 - EXEC events

While there refactor related functions in order to simplify the code.

Add missing comment documentation for recently added kernel functions.
2019-05-03 22:34:21 +00:00
kamil ac37cdce0c Introduce fixes for ptrace(2)
Stop disabling LWP create and exit events for PT_SYSCALL tracing.
PT_SYSCALL disabled EXEC reporting for legacy reasons, there is no need
to repeat it for LWP and CHLD events.

Pass full siginfo from trapsignal events (SEGV, BUS, ILL, TRAP, FPE).
This adds missing information about signals like fault address.

Set ps_lwp always.

Before passing siginfo to userland through p_sigctx.ps_info, make sure
that it was zeroed for unused bytes. LWP and CHLD events do not set si_addr
and si_trap, these pieces of information are passed for crashes (like
software breakpoint).

LLDB crash reporting works now correctly:

(lldb) r
Process 552 launched: '/tmp/a.out' (x86_64)
Process 552 stopped
* thread #1, stop reason = signal SIGSEGV: invalid address (fault address: 0x123456)
2019-05-02 22:23:49 +00:00
kamil 5b66da1dba Call MD code in mi_startlwp() before MI check for debugger
This allows to get initialized mcontext.
2019-05-01 22:55:55 +00:00
kamil d1fa1f15ea Correct passing debugger related events for LWP create and exit
Add MI toplevel startlwp function.

Switch all userland LWPs to go through lwp_create using a shared
mi_startlwp() function between all MD ABIs.

Add debugger related event handling in mi_startlwp() and continue with
standard p->p_emul->e_startlwp at the end of this routine.

Use eventswitch() to notify the event of LWP exit in lwp_exit().

ATF ptrace(2) tests signal9 and signal10 now pass.
2019-05-01 21:57:34 +00:00
ozaki-r 8812081aa6 Apply C99-style struct initialization to syncobj_t 2018-01-30 07:52:22 +00:00
christos 85bf85b701 make _lwp_park return the remaining time to sleep in the "ts" argument
if it is a relative timestamp, as discussed in tech-kern.
XXX: pullup-8
2017-12-08 01:19:29 +00:00
chs fd34ea77eb remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP
  kmem_zalloc() with KM_SLEEP
  percpu_alloc()
  pserialize_create()
  psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
2017-06-01 02:45:05 +00:00
kamil 795febebbd Try to fix build of sys_lwp.c
lwp_create() has been acquired more arguments, there was missing the latest
one. Per analogiam with changes in the same commit to other source files,
go for &SS_INIT.
2017-04-21 19:38:35 +00:00
christos d7746f2ee3 - Propagate the signal mask from the ucontext_t to the newly created thread
as specified by _lwp_create(2)
- Reset the signal stack for threads created with _lwp_create(2)
2017-04-21 15:10:34 +00:00
maya 8341f84221 use a bound string copy 2017-01-15 01:28:14 +00:00
maxv 6647020bbc Unused inits (harmless).
Found by Brainy.
2015-07-24 13:02:52 +00:00
christos 4cec95f0ea Centralize the computation of struct timespec to the int timo.
Make lwp_park take the regular arguments for specifying what kind
of timeout we supply like clock_nanosleep(), namely clockid_t and flags.
2013-03-29 01:08:17 +00:00
rmind ea775f7598 exit_lwps, lwp_wait: fix a race condition by re-trying if p_lock was dropped
in a case of process exit.  Necessary to re-flag all LWPs for exit, as their
state might have changed or new LWPs spawned.

Should fix PR/46168 and PR/46402.
2012-09-27 20:43:15 +00:00
martin 6c3cc552c2 Calling _lwp_create() with a bogus ucontext could trigger a kernel
assertion failure (and thus a crash in DIAGNOSTIC kernels). Independently
discovered by YAMAMOTO Takashi and Joel Sing.

To avoid this, introduce a cpu_mcontext_validate() function and move all
sanity checks from cpu_setmcontext() there. Also untangle the netbsd32
compat mess slightly and add a cpu_mcontext32_validate() cousin there.

Add an exhaustive atf test case, based partly on code from Joel Sing.

Should finally fix the remaining open part of PR kern/43903.
2012-05-21 14:15:16 +00:00
rmind ad12c77015 Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
2012-02-19 21:05:51 +00:00
chs 33fa5ccbbf many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
 - support new-style (NPTL) linux pthreads on all platforms.
   clone() with CLONE_THREAD uses 1 process with many LWPs
   instead of separate processes.
 - move the contents of sys__lwp_setprivate() into a new
   lwp_setprivate() and use that everywhere.
 - update linux_release[] and linux32_release[] to "2.6.18".
 - adjust placement of emul fork/exec/exit hooks as needed
   and adjust other emul code to match.
 - convert all struct emul definitions to use named initializers.
 - change the pid allocator to allow multiple pids to refer to the same proc.
 - remove a few fields from struct proc that are no longer needed.
 - disable the non-functional "vdso" code in linux32/amd64,
   glibc works fine without it.
 - fix a race in the futex code where we could miss a wakeup after
   a requeue operation.
 - redo futex locking to be a little more efficient.
2010-07-07 01:30:32 +00:00
yamt d5dec378f9 increment p_nrlwps in lwp_create rather than letting callers do so
as it's always decremented by lwp_exit.  this fixes error recovery of
eg. aio_procinit.
2010-06-13 04:13:31 +00:00
skrll 1c518780b3 Follow the correct locking protocol when creating an LWP and the process
is stopping.

Problem found by running the gdb testsuite (gdb didn't have pthreads
support)

Thanks to rmind for help with this.
2010-06-06 07:46:17 +00:00
rmind 13f624ca0f Remove lwp_uc_pool, replace it with kmem(9), plus add some consistency.
As discussed, a while ago, with ad@.
2010-04-23 19:18:09 +00:00
rmind b9a294cf04 - Move inittimeleft() and gettimeleft() to subr_time.c, where they belong.
- Move abstimeout2timo() there too and export.  Use it in lwp_park().
2009-11-01 21:46:09 +00:00
rmind 30d0b02e57 Make lwp_park_sobj and lwp_park_tab static.
Wrap long lines while here.
2009-10-22 13:12:47 +00:00
rmind 40cf6f3659 Remove uarea swap-out functionality:
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code.  Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
2009-10-21 21:11:57 +00:00
ad ead83a47c8 _lwp_setprivate: provide the value to MD code if a hook is present.
This will be used to support TLS. The MD method must match the ELF TLS spec
for that CPU architecture (if there is a spec).

At this time it is only implemented for i386, where it means setting the
per-thread base address for %gs. Please implement this for your platform!
2009-03-29 09:24:52 +00:00
christos 461a86f9bd merge christos-time_t 2009-01-11 02:45:45 +00:00
ad ad507e54f8 _lwp_kill: set SI_LWP in the siginfo, not SI_USER. 2008-10-16 08:47:07 +00:00
wrstuden fc7511b00e Merge wrstuden-revivesa into HEAD. 2008-10-15 06:51:17 +00:00
ad 93e0e98369 Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.
2008-05-26 12:08:38 +00:00
martin ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00