SCHED_FIFO threads
- Change condvar sync so that we never take the condvar's spinlock without
first holding the caller-provided mutex. Previously, the spinlock was only
taken without the mutex in an error path, but it was enough to trigger the
problem described in the PR.
- Even with this change, applications calling pthread_cond_signal/broadcast
without holding the interlocking mutex are still subject to the problem
described in the PR. POSIX discourages this saying that it leads to
undefined scheduling behaviour, which seems good enough for the time being.
- Elsewhere, use a hash of mutexes instead of per-object spinlocks to
synchronize entry/exit from sleep queues.
- Simplify how sleep queues are maintained.
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
we are up and running and unfortunatley getenv() takes locks.
Other changes:
- Cache the spinlock vectors in pthread__st. Internal spinlock operations
now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
the following do not wake other threads early:
pthread_mutex_lock(&mutex);
pthread_cond_broadcast(&cond);
foo = malloc(100); /* takes libc mutexes */
pthread_mutex_unlock(&mutex);
- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
variables instead of doing the synchronization directly. Spinlocks
are no longer used by the semaphore code.
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
detach/join.
- Make mutex acquire spin for a short time, as done with spinlocks.
- Make the number of spins controllable with the env var PTHREAD_NSPINS.
- Reduce the amount of time that libpthread internal spinlocks are held.
- Rely more on the barrier effects of park/unpark to avoid taking spinlocks.
- Simplify the locking around pthreads and the global queues.
- Align per-thread sync data on a 128 byte boundary.
- Offset thread stacks by a small amount to try and reduce cache thrash.
After resuming execution, the thread must check to see if it
has been restarted as a result of pthread_cond_signal(). If it
has, but cannot take the wakeup (because of eg a pending Unix
signal or timeout) then try to ensure that another thread sees
it. This is necessary because there may be multiple waiters,
and at least one should take the wakeup if possible.
so avoid making them.
- When parking an LWP on a condition variable, point the hint argument at
the mutex's waiters queue. Chances are we will be awoken from that later.
the mutex that the waiters are using to synchronise, then transfer them
to the mutex's waiters list so that the wakeup is deferred until release
of the mutex. Improves the timings for CV sleep/wakeup by between 30-100%
in tests conducted locally on a UP system. There can be a penalty for MP
systems when only one thread is being awoken, but in practice I think it
won't be be an issue.
- pthread_signal: search for a thread that does not have a pending wakeup.
Threads can have a pending wakeup and still be on the waiters list if we
clash with an earlier pthread_cond_broadcast().
waiters list in all cases, not just on cancellation; there are other
sources of spurious wakeups, such as single-stepping in the debugger.
regress/lib/libpthread/conddestroy1 now passes.
cancellation:
* Arrange to not set ptc_mutex until after the pre-sleep cancellation
test.
* In the post-sleep cancellation test, check if there are no more
sleepers and clear ptc_mutex if so.
While here, sprinkle some __predict_false() around the cancellation
tests.
officially have undefined behavior, from returning an error code to raising
an assertion failure.
Also, don't bother to explicitly test for (illegal) null pointers and return
an error; they'll bomb out soon enough.
front of the sleep queue rather than the back. This is more
cache-friendly behavior and within the (lack of) constraints on wakeup
ordering imposed on equal-priority threads.
pthread_cond_timedwait() is called before any threads have been
created and the SA infrastructure is up and running.
Addresses PR lib/20139.
XXX probably need to do this for all of the pthread_*_timedlock()
functions, too.