(found by... running the regress test!)
* clean up punctuation.
* create a proper frame for the child fn that follows the o32 calling
conventions. In particular, leave 4 stack slots that the child
fn can write on, put the GP above them, and invoke .cprestore
properly in light of the child fn arg area. (realized it was a
problem upon inspection, verified using the regress test compiled
-O0.)
code (which, uh, seems the default for a fresh build)... it wasn't
setting up v1 properly (the instruction to set up v1 was after the
return jump, in "reorder" code... i.e. after the end of the function).
That would break error returns from 64-bit syscalls (e.g. checks
in dd and who knows what else) to see if input or output are pipes.
It looks like the non-_REENTRANT version was broken (on the nathanw-sa
branch) in rev 1.9.2.1 and fixed in 1.9.2.2, but the _REENTRANT version
was never fixed, and the broken bits were merged back on to the trunk.
trouble is caused by the memory allocation in the mutex initialization,
and uncontested mutexes and condition variables have become faster in the
meantime.
pthread_cond_timedwait() is called before any threads have been
created and the SA infrastructure is up and running.
Addresses PR lib/20139.
XXX probably need to do this for all of the pthread_*_timedlock()
functions, too.
over a sleep queue and puts everything on the run queue. This permits
the iteration to be inside the acquisition of the run queue spinlock,
avoiding repetitive acquire/release cycles.
pthread_cond_broadcast(): use double-checked locking to avoid
pthread__self() and pthread_spinlock() when signaling or broadcasting
on a condition variable with no waiters.
* Use a double-checked locking technique to avoid taking
the interlock in pthread_mutex_unlock().
* In pthread_mutex_lock() and pthread_mutex_trylock(), only store the
stack pointer, not the thread ID, in ptm_owner. Do the translation
to a thread ID in the slow-lock, errorcheck, and recursive mutex
cases rather than in the common path.
* Juggle where pthread__self() is called, to move it out of the fast path.
Overall, this means that neither pthread_self() nor
pthread_spin[un]lock() are used in the course of locking and unlocking
an uncontested mutex. Speeds up the fast path by 40-50%, and
eliminates about 98% of spinlocks used by a couple of large threaded
applications.
(Still a GET_MUTEX_PRIVATE() in the fast path... perhaps the type
should be in the main body of the mutex).
* Implement pthread_kill().
* Return the old thread mask, not the old process mask, in our
interpositioned sigaction call.
* Refer to _NSIG, not NSIG.
* Gut pthread_sigmask(). It was handling a lot of corner cases that
weren't legal anyway. Handle unblocked signals with a new
pthread__kill_self() routine (also used by pthread_kill()).
* Be more consistent with locking around pt_sigacts[].
switch statement, and moving upcall-type-specific code into that switch.
Beneficial side effect: don't manipulate a statelock before lock resolution
occurs.
by adding a generation number to the barrier structure and incrementing it
when the barrier fires.
XXX this is an ABI change for anything using barriers, but the library is
new enough and nothing in the tree uses barriers so I'm going to let it
slide. Using the private data pointer for a field that will always be present
would be excessive.
- Signal handlers now simply continue executing the current thread,
rather than trying to put themselves back on the queue that they came
from, which was rather fragile. As a result, all callers of
pthread__block() must be prepared to handle spurious wakeups.
- When a signal arrives for a thread that is blocked in the kernel,
note this in another field in pthread_st and set a flag. Process the
signal and set up the trampoline for the handler *after* the thread
unblocks, so that both the trampoline and the returned state from
the kernel are preserved.
- Factor out some code into a pthread__deliver_signal() routine;
the signal-taking code in pthread_sigmask() should be able to use this
soon.
This is still gross, and there are still some terrible MP issues lurking here,
but progress crawls along.
if the corresponding TSD entry is empty.
Cuts down lock/unlock pairs for this operation from 256 to the number
of active TSD entries; sicne this is done when every thread exits, it saves
many total lock/unlock pairs.