the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.
- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.
In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.
Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.
Nudged in this direction by ad@ and chs@.
identifier uniquely identifies an LWP across the entire system, and will
be used in future improvements in user-space synchronization primitives.
(Test disabled and libc stub not included intentionally so as to avoid
multiple libc version bumps.)
There should perhaps be a pcu_fork operation to keep this factored
neatly but this will be simpler to pull up.
In practical terms, this may not affect most architecture that use
pcu(9) -- alpha, arm32, mips, powerpc, riscv -- but it does affect
aarch64, in which v8-v15 are callee-saves, and GCC actually takes
advantage of them, and for more than just floating-point data too.
XXX pullup
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.
- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.
- Group the locks in struct proc at the end of the struct in their own cache
line.
- Add some comments.
every clock tick and kick all the LWPs again.
- lwp_create(): copy the LW_WEXIT etc flags while holding the parent's
p_lock. Copy only LW_WREBOOT in the case of fork(), since a pending
coredump or exit() in the parent process isn't for the child.
where curcpu() is defined as curlwp->l_cpu:
- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.
- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.
- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.
- Remove some KERNEL_LOCK handling which hasn't been needed for years.
This seems to take about 3us on my Intel system. Two changes required:
- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().
While here:
- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().
pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.
From rmind@, tidied up by me.
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.
We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.
The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.
The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).
We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.
Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.
Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.
The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.
This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.
kMSan is available with LLVM, but not with GCC.
The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
_lwp_self() remains invariant as necessary for the locking in the
dynamic linker. Otherwise if a process creates a thread and forks from
it, the main thread of the parent would share the LWP ID of the main
thread of the child, even though they have different origins.
Partial fix for pkg/54192.
Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.
This was a Windows-style behavior that makes threading tracing fragile.
Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.
Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.
Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.
Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.
Add a couple of compile-time asserts for assumptions in the code.
No functional change intended in existing ptrace(2) software.
All ATF ptrace(2) and ATF GDB tests pass.
This change improves reliability of the threading ptrace(2) code.
Do not emit signals to parent for if a process is demising:
- fork/vfork/similar
- lwp created/exited
- exec
- syscall entry/exit
With these changes Go applications can be traced without a clash under
a debugger, at least without deadlocking always. The culprit reason was
an attempt to inform a debugger in the middle of exit1() call about
a dying LWP. Go applications perform exit(2) without collecting threads
first. Verified with GDB and picotrace-based utilities like sigtracer.
PR kern/53120
PR port-arm/51677
PR bin/54060
PR bin/49662
PR kern/52548
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.
Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.
The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.
The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.
Proposed on tech-kern
Stop disabling LWP create and exit events for PT_SYSCALL tracing.
PT_SYSCALL disabled EXEC reporting for legacy reasons, there is no need
to repeat it for LWP and CHLD events.
Pass full siginfo from trapsignal events (SEGV, BUS, ILL, TRAP, FPE).
This adds missing information about signals like fault address.
Set ps_lwp always.
Before passing siginfo to userland through p_sigctx.ps_info, make sure
that it was zeroed for unused bytes. LWP and CHLD events do not set si_addr
and si_trap, these pieces of information are passed for crashes (like
software breakpoint).
LLDB crash reporting works now correctly:
(lldb) r
Process 552 launched: '/tmp/a.out' (x86_64)
Process 552 stopped
* thread #1, stop reason = signal SIGSEGV: invalid address (fault address: 0x123456)
Add MI toplevel startlwp function.
Switch all userland LWPs to go through lwp_create using a shared
mi_startlwp() function between all MD ABIs.
Add debugger related event handling in mi_startlwp() and continue with
standard p->p_emul->e_startlwp at the end of this routine.
Use eventswitch() to notify the event of LWP exit in lwp_exit().
ATF ptrace(2) tests signal9 and signal10 now pass.
It detects leaks by counting up the number of held psref by an LWP and checking
its zeroness at the end of syscalls and softint handlers. For the counter, a
unused field of struct lwp is reused.
The detector runs only if DIAGNOSTIC is turned on.
Do not left shift signed integer in a way that the signedness bit is
changed.
sys/kern/kern_lwp.c:1892:29, left shift of 1 by 31 places cannot be represented in type 'int'
Detected with Kernel Undefined Behavior Sanitizer.
Reported by <Harry Pantazis>
Do not left shift signed integer in a way that the signedness bit is
changed.
sys/kern/kern_lwp.c:1849:27, left shift of 1 by 31 places cannot be represented in type 'int'
Detected with Kernel Undefined Behavior Sanitizer.
Reported by <Harry Pantazis>