this implementation is extremely ugly and inefficient, but it avoids a
good deal of code duplication and bloat. it may be cleaned up later to
eliminate the remaining code duplication and some of the warts, but i
don't really care about its performance.
note that swprintf is not yet implemented.
some of this code should be cleaned up, e.g. using macros for some of
the bit flags, masks, etc. nonetheless, the code is believed to be
working and correct at this point.
if the mutex was previously locked, we can assume pthread_self was
already called at the time of locking, and thus that the thread
pointer is initialized.
the layout has been chosen so that pointer slots 3 and 4 fit between
the integer slots on 32-bit archs, and come after the integer slots on
64-bit archs.
for some reason these functions are not shaded by the PS/TPS option in
POSIX, so presumably they are mandatory, even though the functionality
they offer is optional. for now, provide them in case any programs
depend on their existence, but disallow any priority except the
default.
multiple opens of the same named semaphore must return the same
pointer, and only the last close can unmap it. thus the ugly global
state keeping track of mappings. the maximum number of distinct named
semaphores that can be opened is limited sufficiently small that the
linear searches take trivial time, especially compared to the syscall
overhead of these functions.
we can avoid blocking signals by simply using a flag to mark that the
thread has exited and prevent it from getting counted in the rsyscall
signal-pingpong. this restores the original pthread create/join
throughput from before the sigprocmask call was added.
1. any padding in the siginfo struct was not necessarily zero-filled,
so it might have contained private data off the caller's stack.
2. the uid and pid must be filled in from userspace. the previous
rsyscall fix broke rsyscalls because the values were always incorrect.
these functions are specified inconsistent in whether they're
specified to return an error value, or return -1 and set errno.
hopefully now they all match what POSIX requires.
the set_tid_address returns the tid (which is also the pid when called
from the initial thread) so there is no need to make a separate
syscall to get pid/tid.
a signal handler could fork after the pid/tid were read, causing the
wrong process to be signalled. i'm not sure if this is supposed to
have UB or not, but raise is async-signal-safe, so it probably is
allowed. the current solution is slightly expensive so this
implementation is likely to be changed in the future.