non-interlocked CAS in the fast unlock path -- it is unsafe to test for
the waiters-bit while the owner thread is running, we have to spin for
the owner or its state change to be sure about the presence of the bit.
Split off the logic into the pthread__mutex_setwaiters() routine.
This is a partial fix to the named lockup problem (also see PR/44756).
It seems there is another race which can be reproduced on faster CPUs.
has two parts:
- in pthread_cond_timedwait() if the thread we are trying to unpark
exited, retry the the _lwp_park call without it.
- pthread_mutex() was affecting errno since it is calling _lwp_park()
from pthread_mutex_lock_slow(). preserve the original errno.
Note that the example problem still causes an occassional deadlock on machines
with many CPUs and it is the same deadlock we observe with named.
- Fail if the dlopened libpthread does pthread_create(). From manu@
- Discussed at length in the mailing lists; approved by core@
- This was chosen as the least intrusive patch that will provide
the necessary functionality.
XXX: pullup to 6
pthread__smt_pause. These are implemented using the ARM instructions
SEV (wake) and WFE (pause). These are treated as NOPs on ARM CPUs that
don't support them.
SCHED_FIFO threads
- Change condvar sync so that we never take the condvar's spinlock without
first holding the caller-provided mutex. Previously, the spinlock was only
taken without the mutex in an error path, but it was enough to trigger the
problem described in the PR.
- Even with this change, applications calling pthread_cond_signal/broadcast
without holding the interlocking mutex are still subject to the problem
described in the PR. POSIX discourages this saying that it leads to
undefined scheduling behaviour, which seems good enough for the time being.
- Elsewhere, use a hash of mutexes instead of per-object spinlocks to
synchronize entry/exit from sleep queues.
- Simplify how sleep queues are maintained.
- Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np,
rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking
primitives and can be used when porting kernel code to userspace.
- Always create LWPs detached. Do join/exit sync mostly in userland. When
looped on a dual core box this seems ~30% quicker than using lwp_wait().
Reduce number of lock acquire/release ops during thread exit.
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
we are up and running and unfortunatley getenv() takes locks.
Other changes:
- Cache the spinlock vectors in pthread__st. Internal spinlock operations
now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
the following do not wake other threads early:
pthread_mutex_lock(&mutex);
pthread_cond_broadcast(&cond);
foo = malloc(100); /* takes libc mutexes */
pthread_mutex_unlock(&mutex);
- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
variables instead of doing the synchronization directly. Spinlocks
are no longer used by the semaphore code.
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
detach/join.
- Make mutex acquire spin for a short time, as done with spinlocks.
- Make the number of spins controllable with the env var PTHREAD_NSPINS.
- Reduce the amount of time that libpthread internal spinlocks are held.
- Rely more on the barrier effects of park/unpark to avoid taking spinlocks.
- Simplify the locking around pthreads and the global queues.
- Align per-thread sync data on a 128 byte boundary.
- Offset thread stacks by a small amount to try and reduce cache thrash.
so avoid making them.
- When parking an LWP on a condition variable, point the hint argument at
the mutex's waiters queue. Chances are we will be awoken from that later.
be used only as as a hint. Clear the pointer when releasing the mutex.
- When releasing a mutex, wake all waiters. Makes it possible to tranfer
waiters from another object to a mutex.
mutex thread, thus leaving a thread sleeping on an unlocked mutex.
Reviewed by myself and Christos.
Problem reported by Arne H. Juul, arnej at pvv dot ntnu dot no,
in PR 26208. This fix represents option 1 presented in the PR.
any threads are created turned out to be not such a good idea.
there are stronger requirements on what has to work in a forked child
while a process is still single-threaded. so take all that stuff
back out and fix the problems with single-threaded programs that
are linked with libpthread differently, by checking if the library
has been started and doing completely different stuff if it hasn't been:
- for pthread_rwlock_timedrdlock(), just fail with EDEADLK immediately.
- for sem_wait(), the only thing that can unlock the semaphore is a
signal handler, so use sigsuspend() to wait for a signal.
- for pthread_mutex_lock_slow(), just go into an infinite loop
waiting for signals.
I also noticed that there's a "sem2" test that has never worked in its
single-threaded form. the problem there is that a signal handler tries
to take a sem_t interlock which is already held when the signal is received.
fix this too, by adding a single-threaded case for sig_trywait() that
blocks signals instead of using the userland interlock.
call pthread__start() if it hasn't already been called. this avoids
an internal assertion from the library if these routines are used
before any threads are created and they need to sleep.
fixes PR 20256, PR 24241, PR 25722, PR 26096.
- enable concurrency according to environment variable PTHREAD_CONCURRENCY
- add idle VP wakeup if there are additional jobs and idle VPs
- make reidlequeue per VP
- enable spinning for locks
- fix race condition in alarm processing
- fix race condition in mutex locking
- make debugging output line buffered and add VP prefix to debug lines
officially have undefined behavior, from returning an error code to raising
an assertion failure.
Also, don't bother to explicitly test for (illegal) null pointers and return
an error; they'll bomb out soon enough.
front of the sleep queue rather than the back. This is more
cache-friendly behavior and within the (lack of) constraints on wakeup
ordering imposed on equal-priority threads.