over a sleep queue and puts everything on the run queue. This permits
the iteration to be inside the acquisition of the run queue spinlock,
avoiding repetitive acquire/release cycles.
pthread_cond_broadcast(): use double-checked locking to avoid
pthread__self() and pthread_spinlock() when signaling or broadcasting
on a condition variable with no waiters.
* Use a double-checked locking technique to avoid taking
the interlock in pthread_mutex_unlock().
* In pthread_mutex_lock() and pthread_mutex_trylock(), only store the
stack pointer, not the thread ID, in ptm_owner. Do the translation
to a thread ID in the slow-lock, errorcheck, and recursive mutex
cases rather than in the common path.
* Juggle where pthread__self() is called, to move it out of the fast path.
Overall, this means that neither pthread_self() nor
pthread_spin[un]lock() are used in the course of locking and unlocking
an uncontested mutex. Speeds up the fast path by 40-50%, and
eliminates about 98% of spinlocks used by a couple of large threaded
applications.
(Still a GET_MUTEX_PRIVATE() in the fast path... perhaps the type
should be in the main body of the mutex).
(1) ELFNAME(load_file)() now takes a pointer to the entry point
offset, instead of taking a pointer to the entry point itself. This
allows proper adjustment of the ultimate entry point at a higher level
if the object containing the entry point is moved before the exec is
finished.
(2) Introduce VMCMD_FIXED, which means the address at which a given
vmcmd describes a mapping is fixed (ie, should not be moved). Don't
set this for entries pertaining to ld.so.
Also some minor comment/whitespace tweaks.
Our behavior is now consistent with Solaris, and more useful than previous.
Unfortunately we end up strtol()-ing twice (once via atoi()) to avoid
changing find_parsenum().
sysret (should it have entered through syscall), or via a plain
iret. This can be done more quicker and dirtier, but I've decided
against that for now.
* Implement pthread_kill().
* Return the old thread mask, not the old process mask, in our
interpositioned sigaction call.
* Refer to _NSIG, not NSIG.
* Gut pthread_sigmask(). It was handling a lot of corner cases that
weren't legal anyway. Handle unblocked signals with a new
pthread__kill_self() routine (also used by pthread_kill()).
* Be more consistent with locking around pt_sigacts[].