SCHED_FIFO threads
- Change condvar sync so that we never take the condvar's spinlock without
first holding the caller-provided mutex. Previously, the spinlock was only
taken without the mutex in an error path, but it was enough to trigger the
problem described in the PR.
- Even with this change, applications calling pthread_cond_signal/broadcast
without holding the interlocking mutex are still subject to the problem
described in the PR. POSIX discourages this saying that it leads to
undefined scheduling behaviour, which seems good enough for the time being.
- Elsewhere, use a hash of mutexes instead of per-object spinlocks to
synchronize entry/exit from sleep queues.
- Simplify how sleep queues are maintained.
- Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np,
rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking
primitives and can be used when porting kernel code to userspace.
- Always create LWPs detached. Do join/exit sync mostly in userland. When
looped on a dual core box this seems ~30% quicker than using lwp_wait().
Reduce number of lock acquire/release ops during thread exit.
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
we are up and running and unfortunatley getenv() takes locks.
Other changes:
- Cache the spinlock vectors in pthread__st. Internal spinlock operations
now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
the following do not wake other threads early:
pthread_mutex_lock(&mutex);
pthread_cond_broadcast(&cond);
foo = malloc(100); /* takes libc mutexes */
pthread_mutex_unlock(&mutex);
- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
variables instead of doing the synchronization directly. Spinlocks
are no longer used by the semaphore code.
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
detach/join.
- Make mutex acquire spin for a short time, as done with spinlocks.
- Make the number of spins controllable with the env var PTHREAD_NSPINS.
- Reduce the amount of time that libpthread internal spinlocks are held.
- Rely more on the barrier effects of park/unpark to avoid taking spinlocks.
- Simplify the locking around pthreads and the global queues.
- Align per-thread sync data on a 128 byte boundary.
- Offset thread stacks by a small amount to try and reduce cache thrash.
so avoid making them.
- When parking an LWP on a condition variable, point the hint argument at
the mutex's waiters queue. Chances are we will be awoken from that later.
be used only as as a hint. Clear the pointer when releasing the mutex.
- When releasing a mutex, wake all waiters. Makes it possible to tranfer
waiters from another object to a mutex.
mutex thread, thus leaving a thread sleeping on an unlocked mutex.
Reviewed by myself and Christos.
Problem reported by Arne H. Juul, arnej at pvv dot ntnu dot no,
in PR 26208. This fix represents option 1 presented in the PR.
any threads are created turned out to be not such a good idea.
there are stronger requirements on what has to work in a forked child
while a process is still single-threaded. so take all that stuff
back out and fix the problems with single-threaded programs that
are linked with libpthread differently, by checking if the library
has been started and doing completely different stuff if it hasn't been:
- for pthread_rwlock_timedrdlock(), just fail with EDEADLK immediately.
- for sem_wait(), the only thing that can unlock the semaphore is a
signal handler, so use sigsuspend() to wait for a signal.
- for pthread_mutex_lock_slow(), just go into an infinite loop
waiting for signals.
I also noticed that there's a "sem2" test that has never worked in its
single-threaded form. the problem there is that a signal handler tries
to take a sem_t interlock which is already held when the signal is received.
fix this too, by adding a single-threaded case for sig_trywait() that
blocks signals instead of using the userland interlock.
call pthread__start() if it hasn't already been called. this avoids
an internal assertion from the library if these routines are used
before any threads are created and they need to sleep.
fixes PR 20256, PR 24241, PR 25722, PR 26096.
- enable concurrency according to environment variable PTHREAD_CONCURRENCY
- add idle VP wakeup if there are additional jobs and idle VPs
- make reidlequeue per VP
- enable spinning for locks
- fix race condition in alarm processing
- fix race condition in mutex locking
- make debugging output line buffered and add VP prefix to debug lines
officially have undefined behavior, from returning an error code to raising
an assertion failure.
Also, don't bother to explicitly test for (illegal) null pointers and return
an error; they'll bomb out soon enough.
front of the sleep queue rather than the back. This is more
cache-friendly behavior and within the (lack of) constraints on wakeup
ordering imposed on equal-priority threads.
* Use a double-checked locking technique to avoid taking
the interlock in pthread_mutex_unlock().
* In pthread_mutex_lock() and pthread_mutex_trylock(), only store the
stack pointer, not the thread ID, in ptm_owner. Do the translation
to a thread ID in the slow-lock, errorcheck, and recursive mutex
cases rather than in the common path.
* Juggle where pthread__self() is called, to move it out of the fast path.
Overall, this means that neither pthread_self() nor
pthread_spin[un]lock() are used in the course of locking and unlocking
an uncontested mutex. Speeds up the fast path by 40-50%, and
eliminates about 98% of spinlocks used by a couple of large threaded
applications.
(Still a GET_MUTEX_PRIVATE() in the fast path... perhaps the type
should be in the main body of the mutex).