Commit Graph

241 Commits

Author SHA1 Message Date
maxv
7040dfd053 Permanent node doesn't need a log, plus the log gets leaked anyway. Found
by kLSan.
2020-06-22 16:21:29 +00:00
ad
3ce4840c59 lwp_exit(): add a warning about (l != curlwp) 2020-06-06 22:26:47 +00:00
thorpej
69cb27bb5c lwp_thread_cleanup(): Remove overly-aggressive assertion. 2020-06-01 13:58:14 +00:00
ad
0eaaa024ea Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
2020-05-23 23:42:41 +00:00
ad
20180cb18f - Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion.  The basic idea is that when
  doing an LWP lookup there is usually already a lock held (p->p_lock), or a
  spin mutex that needs to be taken (l->l_mutex), and either can be used to
  get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
  by ID"), and lookup by ID in proc0 doesn't really happen.  In-tree the new
  state should be understood by top(1), the tty subsystem and so on, and
  would attract the attention of 3rd party kernel grovellers in time, so
  remove it and just rely on LSIDL.
2020-05-23 20:45:10 +00:00
thorpej
59150873b5 - proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc.  Add a separate proc_find_lwpid() to look up a
  proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
  *and* a proc given the ID of any LWP.  Returns with the proc::p_lock
  held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
  allow the proc to be wildcarded, rather than just curproc or specific
  proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
  in a much nicer way, so garbage-collect the remnants of that recently
  added mechanism.
2020-04-29 01:52:26 +00:00
thorpej
276ef22378 Add a NetBSD native futex implementation, mostly written by riastradh@.
Map the COMPAT_LINUX futex calls to the native ones.
2020-04-26 18:53:31 +00:00
thorpej
156895706e Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs.  The
lead LWP of a process gets the PID as its LID.  If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.
2020-04-24 03:22:06 +00:00
ad
bdfa5c0012 lwp_wait(): don't need to check for process exit, cv_wait_sig() does it. 2020-04-19 23:05:04 +00:00
thorpej
98a9cebbb6 Add support for lazily generating a "global thread ID" for a LWP. This
identifier uniquely identifies an LWP across the entire system, and will
be used in future improvements in user-space synchronization primitives.

(Test disabled and libc stub not included intentionally so as to avoid
multiple libc version bumps.)
2020-04-04 20:20:12 +00:00
maxv
b3036422e2 Drop specificdata from KCOV, kMSan doesn't interact well with it. Also
reduces the overhead.
2020-04-04 06:51:46 +00:00
ad
6a317a61b7 Fix crash observed with procfs on current-users by David Hopper. LWP refcnt
and p_zomblwp both must reach the needed state, and LSZOMB be set, under a
single hold of p_lock.
2020-03-26 21:31:55 +00:00
ad
a977daea03 softint_overlay() (slow case) gains ~nothing but creates potential headaches.
In the interests of simplicity remove it and always use the kthreads.
2020-03-26 20:19:06 +00:00
ad
58ea2fbbd4 PR kern/55020: dbregs_dr?_dont_inherit_lwp test cases fail on real hardware
lwp_wait(): make the check for deadlock much more permissive.
2020-03-08 17:04:45 +00:00
ad
b90b824e65 Remove an unneeded ifdef MULTIPROCESSOR. 2020-02-27 20:52:25 +00:00
ad
82002773ec - Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky.  May help with the "softint screwup"
  panic.

- Correct the memory barriers around zombies switching into oblivion.
2020-02-15 18:12:14 +00:00
ad
980ef21298 PR kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Create an lwp_renumber() from the code in emulexec() and use in
linux_e_proc_exec() and linux_e_proc_fork() too.
2020-02-15 17:13:55 +00:00
dogcow
277fc7271a fix compilation failure for arches without l_pcu_valid
ok riastradh
2020-02-11 06:09:48 +00:00
riastradh
a9d52c09c3 Preserve pcu(9) state in fork.
There should perhaps be a pcu_fork operation to keep this factored
neatly but this will be simpler to pull up.

In practical terms, this may not affect most architecture that use
pcu(9) -- alpha, arm32, mips, powerpc, riscv -- but it does affect
aarch64, in which v8-v15 are callee-saves, and GCC actually takes
advantage of them, and for more than just floating-point data too.

XXX pullup
2020-02-11 03:14:49 +00:00
ad
d1c42b4f7b - Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case.  Replace scans of p->p_lwps with lookups in the
  tree.  Find free LIDs for new LWPs in the tree.  Replace the hashed sleep
  queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
  return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
  line.

- Add some comments.
2020-01-29 15:47:51 +00:00
ad
6c8b9827ed - lwp_wait(): if the process is exiting and no progress is being made, wake
every clock tick and kick all the LWPs again.

- lwp_create(): copy the LW_WEXIT etc flags while holding the parent's
  p_lock.  Copy only LW_WREBOOT in the case of fork(), since a pending
  coredump or exit() in the parent process isn't for the child.
2020-01-27 21:58:16 +00:00
ad
059ae07ae1 Update a comment. 2020-01-26 19:06:24 +00:00
ad
20f33b0230 Catch a leaked hold of kernel_lock sooner with DIAGNOSTIC and make the
message a bit more informative.
2020-01-22 12:23:04 +00:00
ad
f7bef27025 Remove some unneeded kernel_lock handling. 2020-01-12 13:15:08 +00:00
ad
2ddceed1d9 Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
  calling cpu_switchto().  It's not safe to let other actors mess with the
  LWP (in particular l->l_cpu) while it's still context switching.  This
  removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
  it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead.  Everything
  is in cache anyway so it wasn't buying much by trying to avoid saving old
  state.  This means cpu_switchto() will never be called with prevlwp ==
  NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
2020-01-08 17:38:41 +00:00
ad
4477d28d73 Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system.  Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().
2019-12-06 21:36:10 +00:00
riastradh
a0c864ecf3 Rip out pserialize(9) logic now that the RCU patent has expired.
pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler.  Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.
2019-12-03 05:07:48 +00:00
ad
c242783135 Fix a longstanding problem with LWP limits. When changing the user's
LWP count, we must use the process credentials because that's what
the accounting entity is tied to.

Reported-by: syzbot+d193266676f635661c62@syzkaller.appspotmail.com
2019-12-01 15:27:58 +00:00
ad
bacf374405 lwp_start(): don't try to change the target CPU. Fixes potential panic
in setrunnable(). Oops, experimental change that escaped.
2019-11-24 13:23:57 +00:00
ad
7b708f2a89 Put section attribute for turnstile0 in the correct place. For LLVM. 2019-11-24 13:14:23 +00:00
ad
11ba4e1830 Minor scheduler cleanup:
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
  sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
2019-11-23 19:42:52 +00:00
ad
0e70dcbe0f lwp_setlock(): return pointer to the kmutex_t that we replaced 2019-11-21 19:47:21 +00:00
ad
f15dda4bcb lwp_create:
- Don't need to check for PK_SYSTEM when inheriting an affinity mask.
- Inherit processor set ID under proc_lock, to sync with pset syscalls.
2019-11-21 18:22:05 +00:00
ad
e57dd2ba56 - lwp_need_userret(): only do it if ONPROC and !curlwp, and explain why.
- Use signotify() in a couple more places.
2019-11-21 18:17:36 +00:00
maxv
10c5b02320 Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
 - "shad", to track uninitialized memory with a bit granularity (1:1).
   Each bit set to 1 in the shad corresponds to one uninitialized bit of
   real kernel memory.
 - "orig", to track the origin of the memory with a 4-byte granularity
   (1:1). Each uint32_t cell in the orig indicates the origin of the
   associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
 - a code designating the type of memory (Stack, Pool, etc), and
 - a compressed pointer, which points either (1) to a string containing
   the name of the variable associated with the cell, or (2) to an area
   in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
2019-11-14 16:23:52 +00:00
joerg
ffac73eb32 Ensure that the second LWP of a new process uses a free LWP ID.
Document overflow behavior.
2019-11-10 23:39:03 +00:00
joerg
280b4162f9 Preserve the LWP ID of the calling thread on (v)fork. This ensures that
_lwp_self() remains invariant as necessary for the locking in the
dynamic linker. Otherwise if a process creates a thread and forks from
it, the main thread of the parent would share the LWP ID of the main
thread of the child, even though they have different origins.

Partial fix for pkg/54192.
2019-11-07 19:45:18 +00:00
uwe
edcef67ec2 xc_barrier - convenience function to xc_broadcast() a nop.
Make the intent more clear and also avoid a bunch of (xcfunc_t)nullop
casts that gcc 8 -Wcast-function-type is not happy about.
2019-10-06 15:11:16 +00:00
kamil
a35a4fe3b8 Separate flag for suspended by _lwp_suspend and suspended by a debugger
Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.
2019-10-03 22:48:44 +00:00
kamil
5e4bbc4985 Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo
Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.
2019-09-30 21:13:33 +00:00
kamil
bcb2d04797 Stop trying to inform debugger about events from an exiting child
Do not emit signals to parent for if a process is demising:

 - fork/vfork/similar
 - lwp created/exited
 - exec
 - syscall entry/exit

With these changes Go applications can be traced without a clash under
a debugger, at least without deadlocking always. The culprit reason was
an attempt to inform a debugger in the middle of exit1() call about
a dying LWP. Go applications perform exit(2) without collecting threads
first. Verified with GDB and picotrace-based utilities like sigtracer.

PR kern/53120
PR port-arm/51677
PR bin/54060
PR bin/49662
PR kern/52548
2019-06-04 11:54:03 +00:00
ozaki-r
7fc219a5ee Implement an aggressive psref leak detector
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.

Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted.  A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread.  Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.

The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire).  We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.

The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.

Proposed on tech-kern
2019-05-17 03:34:26 +00:00
kamil
efd4138069 Register KTR events for debugger related signals
Register signals for:

 - crashes (FPE, SEGV, FPE, ILL, BUS)
 - LWP events
 - CHLD (FORK/VFORK/VFORK_DONE) events -- temporarily disabled
 - EXEC events

While there refactor related functions in order to simplify the code.

Add missing comment documentation for recently added kernel functions.
2019-05-03 22:34:21 +00:00
kamil
ac37cdce0c Introduce fixes for ptrace(2)
Stop disabling LWP create and exit events for PT_SYSCALL tracing.
PT_SYSCALL disabled EXEC reporting for legacy reasons, there is no need
to repeat it for LWP and CHLD events.

Pass full siginfo from trapsignal events (SEGV, BUS, ILL, TRAP, FPE).
This adds missing information about signals like fault address.

Set ps_lwp always.

Before passing siginfo to userland through p_sigctx.ps_info, make sure
that it was zeroed for unused bytes. LWP and CHLD events do not set si_addr
and si_trap, these pieces of information are passed for crashes (like
software breakpoint).

LLDB crash reporting works now correctly:

(lldb) r
Process 552 launched: '/tmp/a.out' (x86_64)
Process 552 stopped
* thread #1, stop reason = signal SIGSEGV: invalid address (fault address: 0x123456)
2019-05-02 22:23:49 +00:00
kamil
d1fa1f15ea Correct passing debugger related events for LWP create and exit
Add MI toplevel startlwp function.

Switch all userland LWPs to go through lwp_create using a shared
mi_startlwp() function between all MD ABIs.

Add debugger related event handling in mi_startlwp() and continue with
standard p->p_emul->e_startlwp at the end of this routine.

Use eventswitch() to notify the event of LWP exit in lwp_exit().

ATF ptrace(2) tests signal9 and signal10 now pass.
2019-05-01 21:57:34 +00:00
ozaki-r
3843688c40 Implement a simple psref leak detector
It detects leaks by counting up the number of held psref by an LWP and checking
its zeroness at the end of syscalls and softint handlers.  For the counter, a
unused field of struct lwp is reused.

The detector runs only if DIAGNOSTIC is turned on.
2019-04-19 01:52:55 +00:00
hannken
72421a1974 Move pointer to fstrans private data into "struct lwp".
Ride NetBSD 8.99.35
2019-03-01 09:02:03 +00:00
skrll
c8304d84f0 Use cpu_index(). NFC. 2018-11-26 17:18:01 +00:00
kamil
1d52842d90 Avoid undefined behavior in lwp_ctl_free()
Do not left shift signed integer in a way that the signedness bit is
changed.

sys/kern/kern_lwp.c:1892:29, left shift of 1 by 31 places cannot be represented in type 'int'

Detected with Kernel Undefined Behavior Sanitizer.

Reported by <Harry Pantazis>
2018-07-04 18:15:27 +00:00
kamil
85b6812c20 Avoid undefined behavior in lwp_ctl_alloc()
Do not left shift signed integer in a way that the signedness bit is
changed.

sys/kern/kern_lwp.c:1849:27, left shift of 1 by 31 places cannot be represented in type 'int'

Detected with Kernel Undefined Behavior Sanitizer.

Reported by <Harry Pantazis>
2018-07-04 18:13:01 +00:00