Link to https://github.com/rumpkernel/ instead of
a site now taken over by an SEO squatter.
Per discussion on github.com/rumpkernel issues with pooka.
PR misc/57501
This is now distinct from pthread__smt_pause, which is for spin lock
backoff with no paired wakeup.
On Arm, there is a single-bit event register per CPU, and there are two
instructions to manage it:
- wfe, wait for event -- if event register is clear, enter low power
mode and wait until event register is set; then exit low power mode
and clear event register
- sev, signal event -- sets event register on all CPUs (other
circumstances like interrupts also set the event register and cause
wfe to wake)
These can be used to reduce the power consumption of spinning for a
lock, but only if they are actually paired -- if there's no sev, wfe
might hang indefinitely. Currently only pthread_spin(3) actually
pairs them; the other lock primitives (internal lock, mutex, rwlock)
do not -- they have spin lock backoff loops, but no corresponding
wakeup to cancel a wfe.
It may be worthwhile to teach the other lock primitives to pair
wfe/sev, but that requires some performance measurement to verify
it's actually worthwhile. So for now, we just make sure not to use
wfe when there's no sev, and keep everything else the same -- this
should fix severe performance degredation in libpthread on Arm
without hurting anything else.
No change in the generated code on amd64 and i386. No change in the
generated code for pthread_spin.c on arm and aarch64 -- changes only
the generated code for pthread_lock.c, pthread_mutex.c, and
pthread_rwlock.c, as intended.
PR port-arm/57437
XXX pullup-10
No functional change intended -- just safer to do it this way in case
the macros are used in if branches or comma expressions.
PR port-arm/57437 (pthread__smt_pause/wake issue)
XXX pullup-10
Patch from chs@. Comment explaining the story by me. This patch may
not be optimal -- maybe it would be better in pthread__init, or
better for rtld to call _lwp_unpark after _lwp_park in the contened
case -- but we've tested this version and it's annoying to reproduce,
so let's take this version and worry about testing improvements
later.
Stuff like libc's namespace.h, or atomic_op_namespace.h, which does
namespacing tricks like `#define atomic_cas_uint _atomic_cas_uint',
has to go at the top of each .c file. If it goes in the middle, it
might be too late to affect the declarations, and result in compile
errors.
I tripped over this by including <sys/atomic.h> in mips
<machine/lock.h>.
(Maybe we should create a new pthread_namespace.h file for the
purpose, but this'll do for now.)
1. After loading self->pt_rwlocked, membar_enter() must not be
conditional on PTHREAD__ATOMIC_IS_MEMBAR because there is no
atomic r/m/w operation here which could imply the acquire barrier.
(This should maybe just be a load-acquire operation, but we don't
have atomic_load_acquire in userland at the moment -- TBD.)
2. Before storing thread->pt_rwlocked, must issue membar_exit() so
that this is a store-release operation -- except if we had just
done an atomic r/m/w and PTHREAD__ATOMIC_IS_MEMBAR is set, in
which case it can be elided.
The second membar_exit() added here might be safely hoisted out of
the loop but I'm not sure -- needs more analysis to prove that
would be safe.
thread it is trying to awake. The thread could exit the condvar and then
reinsert itself at the head of the list with a new waiter behind it. It's
likely possible to fix this in a way that's wait-free but for now just fix
the bug.
pthread_t, so there's less chance of bad things happening if someone calls
(for example) pthread_cond_broadcast() from a signal handler.
- Remove all the deferred waiter handling except for the one case that really
matters which is transferring waiters from condvar -> mutex on wakeup, and
do that by splicing the condvar's waiters onto the mutex.
- Remove the mutex waiters bit as it's another complication that's not
strictly needed.
Provide the hook from modern jemalloc to avoid using TSD for the thread
destruction cleanup as it can result in reentrancy crashes if fork is
called from a thread that never called malloc as it will result in a
late malloc from the pre-fork synchronisation handler.
timeouts or cancellation where:
- The restarting thread calls _lwp_exit() before another thread gets around
to waking it with _lwp_unpark(), leading to ESRCH (observed by joerg@).
(I may have removed a similar check mistakenly over the weekend.)
- The restarting thread considers itself gone off the sleep queue but
at the same time another thread is part way through waking it, and hasn't
fully completed that operation yet by setting thread->pt_mutexwait = 0.
I think that could have potentially lead to the list of waiters getting
messed up given the right circumstances.
Centralise wakeup of deferred waiters in pthread__clear_waiters() and use
throughout libpthread. Make fewer assumptions. Be more conservative in
pthread_mutex when dealing with pending waiters.
- Remove the "hint" argument everywhere since the kernel doesn't use it any
more.
Optimistically check whether the key has been used by this thread
already and avoid locking in that case. This avoids the atomic operation
in the hot path. When the value is set to non-NULL for the first time,
put the entry on the to-be-freed list and keep it their until
destruction or thread exit. Setting the key to NULL and back is common
enough and updating the list is more expensive than the extra check on
the final round.
value differs b/w oea/booke/ibm4xx, and there's no way to obtain it
from userland. Therefore, this initial value should be corrected by
cpu_setmcontext().
- Comment this in libpthread
- Add KASSERT in cpu_mcontext_validate()
Separate the pthread_atfork(3) call from pthread_tsd_init()
and move it into a distinct function.
Call inside pthread__init() late TSD initialization route, just after
"pthread_atfork(NULL, NULL, pthread__fork_callback);".
Document that malloc(3) initialization is now controlled again and called
during the first pthread_atfork(3) call.
Remove #if 0 code from pthread_mutex.c as we no longer initialize malloc
prematurely.
On error when not aborting, do not return EINVAL as it has a side effect
of being interpreted as matching threads. For invalid threads return
unmatched.
Check pthreads for NULL, before accessing pt_magic field. This avoids
faults on comparision with a NULL pointer.
This behavior is in the scope of UB, but should be easier to deal with
buggy software.
It is enabled unconditionally since 2003 and used only for rwlocks and
spinlocks.
LLVM sanitizers make assumptions that these checks are enabled always.