Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
Chops another ~10% off create/join in a loop on i386.
- Disable low level debugging as this is stable. Improves benchmarks
across the board by a small percentage. Uncontested mutex acquire
and release in a loop becomes about 8% quicker.
- Minor cleanup.
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
detach/join.
- Make mutex acquire spin for a short time, as done with spinlocks.
- Make the number of spins controllable with the env var PTHREAD_NSPINS.
- Reduce the amount of time that libpthread internal spinlocks are held.
- Rely more on the barrier effects of park/unpark to avoid taking spinlocks.
- Simplify the locking around pthreads and the global queues.
- Align per-thread sync data on a 128 byte boundary.
- Offset thread stacks by a small amount to try and reduce cache thrash.
so avoid making them.
- When parking an LWP on a condition variable, point the hint argument at
the mutex's waiters queue. Chances are we will be awoken from that later.
the process' signal mask -- this ends up in a no-op.
Use the system call directly instead.
(This might be done in pthread_sig.c, but for now I wanted a simple
patch which is easily tested and pulled up.)
so that the main thread is not different from others.
as a side effect, fix memory leak in pthread_create on error.
- make pthread__stackid_setup return a error rather than calling err(2).
any threads are created turned out to be not such a good idea.
there are stronger requirements on what has to work in a forked child
while a process is still single-threaded. so take all that stuff
back out and fix the problems with single-threaded programs that
are linked with libpthread differently, by checking if the library
has been started and doing completely different stuff if it hasn't been:
- for pthread_rwlock_timedrdlock(), just fail with EDEADLK immediately.
- for sem_wait(), the only thing that can unlock the semaphore is a
signal handler, so use sigsuspend() to wait for a signal.
- for pthread_mutex_lock_slow(), just go into an infinite loop
waiting for signals.
I also noticed that there's a "sem2" test that has never worked in its
single-threaded form. the problem there is that a signal handler tries
to take a sem_t interlock which is already held when the signal is received.
fix this too, by adding a single-threaded case for sig_trywait() that
blocks signals instead of using the userland interlock.
if the target thread is a zombie.
in all the functions that didn't do so already, verify a pthread_t before
dereferencing it (under #ifdef ERRORCHECK, since these checks are not
mandated by the standard).
clean up some debugging stuff.
call pthread__start() if it hasn't already been called. this avoids
an internal assertion from the library if these routines are used
before any threads are created and they need to sleep.
fixes PR 20256, PR 24241, PR 25722, PR 26096.
namely, put the thread to deadqueue rather than just leaking it.
- fix a race between pthread_detach/join and pthread_exit,
which also causes dead thread leaks.
- statically initialize all global spin locks. on hppa, 0 means
the lock is held, so leaving them with the default value doesn't work.
- compare functions pointers using a function-pointer type rather than
an integral type. on hppa, function pointers may be indirect,
so we need to trigger gcc to emit calls to the function-pointer
canonicalization routines in the millicode.
- on hppa the stack grows up, so handle that using the STACK_* macros.
- enable concurrency according to environment variable PTHREAD_CONCURRENCY
- add idle VP wakeup if there are additional jobs and idle VPs
- make reidlequeue per VP
- enable spinning for locks
- fix race condition in alarm processing
- fix race condition in mutex locking
- make debugging output line buffered and add VP prefix to debug lines