(found by... running the regress test!)
* clean up punctuation.
* create a proper frame for the child fn that follows the o32 calling
conventions. In particular, leave 4 stack slots that the child
fn can write on, put the GP above them, and invoke .cprestore
properly in light of the child fn arg area. (realized it was a
problem upon inspection, verified using the regress test compiled
-O0.)
code (which, uh, seems the default for a fresh build)... it wasn't
setting up v1 properly (the instruction to set up v1 was after the
return jump, in "reorder" code... i.e. after the end of the function).
That would break error returns from 64-bit syscalls (e.g. checks
in dd and who knows what else) to see if input or output are pipes.
It looks like the non-_REENTRANT version was broken (on the nathanw-sa
branch) in rev 1.9.2.1 and fixed in 1.9.2.2, but the _REENTRANT version
was never fixed, and the broken bits were merged back on to the trunk.
trouble is caused by the memory allocation in the mutex initialization,
and uncontested mutexes and condition variables have become faster in the
meantime.
pthread_cond_timedwait() is called before any threads have been
created and the SA infrastructure is up and running.
Addresses PR lib/20139.
XXX probably need to do this for all of the pthread_*_timedlock()
functions, too.
over a sleep queue and puts everything on the run queue. This permits
the iteration to be inside the acquisition of the run queue spinlock,
avoiding repetitive acquire/release cycles.
pthread_cond_broadcast(): use double-checked locking to avoid
pthread__self() and pthread_spinlock() when signaling or broadcasting
on a condition variable with no waiters.
* Use a double-checked locking technique to avoid taking
the interlock in pthread_mutex_unlock().
* In pthread_mutex_lock() and pthread_mutex_trylock(), only store the
stack pointer, not the thread ID, in ptm_owner. Do the translation
to a thread ID in the slow-lock, errorcheck, and recursive mutex
cases rather than in the common path.
* Juggle where pthread__self() is called, to move it out of the fast path.
Overall, this means that neither pthread_self() nor
pthread_spin[un]lock() are used in the course of locking and unlocking
an uncontested mutex. Speeds up the fast path by 40-50%, and
eliminates about 98% of spinlocks used by a couple of large threaded
applications.
(Still a GET_MUTEX_PRIVATE() in the fast path... perhaps the type
should be in the main body of the mutex).
* Implement pthread_kill().
* Return the old thread mask, not the old process mask, in our
interpositioned sigaction call.
* Refer to _NSIG, not NSIG.
* Gut pthread_sigmask(). It was handling a lot of corner cases that
weren't legal anyway. Handle unblocked signals with a new
pthread__kill_self() routine (also used by pthread_kill()).
* Be more consistent with locking around pt_sigacts[].
switch statement, and moving upcall-type-specific code into that switch.
Beneficial side effect: don't manipulate a statelock before lock resolution
occurs.