- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
variables instead of doing the synchronization directly. Spinlocks
are no longer used by the semaphore code.
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
yielding. This is a nasty band-aid but with many threads, looping over
sched_yield() wastes a huge amount of CPU time. It would be nice to have a
way to temporarily disable preemption, but it turns out that's yet another
no-brain concept that has been patented and the patent holder seems to be
suing people lately. Another alternative is probably to have kernel-assisted
spinlocks.
This fix is about the best we can do given the current interfaces. We
could extend the cgetcap(3) interfaces with a function that would return
a character count and handle this in libterm which would provide a more
complete fix and allow a NULL character in a string capability.
leads to loss of precision, leading to rounding into the wrong direction
for the case 0.5-epsilon. use floor() instead.
This also fixes a wrong sign of zero returned with non-default rounding
directions.
Most complex function implementations are from the "c9x-complex" library,
originating from the "cephes" math library, see
http://www.netlib.org/cephes/, from Stephen L. Moshier, incorporated and
redistributed with the NetBSD license by permission of the author.
Error behaviour and other boundary conditions (branch cuts)
need to be looked at.
For namespace sanity, I've done the rename/weak alias procedure to
most of the exported functions which are also used internally.
Didn't do so for sin/cos(f) yet because assembler implementations use
them directly, and renaming functions shared between the main libm
and the machine specific "overlay" might raise binary compatibility
issues.
Chops another ~10% off create/join in a loop on i386.
- Disable low level debugging as this is stable. Improves benchmarks
across the board by a small percentage. Uncontested mutex acquire
and release in a loop becomes about 8% quicker.
- Minor cleanup.
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().
Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.
Document the LOCAL_PEEREID socket option in unix(4).
Based on contribution by Arne H. Juul, minor modifications by myself.
of pending atexit handlers before the structure is reused. This prevents
__cxa_finalize from going into an infinite loop when an atexit handler
register a new atexit handler as in:
#include <stdlib.h>
void two(void) {
}
void one(void) {
atexit(two);
}
int main(void) {
atexit(one);
return 0;
}
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.