- Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np,
rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking
primitives and can be used when porting kernel code to userspace.
- Always create LWPs detached. Do join/exit sync mostly in userland. When
looped on a dual core box this seems ~30% quicker than using lwp_wait().
Reduce number of lock acquire/release ops during thread exit.
Chops another ~10% off create/join in a loop on i386.
- Disable low level debugging as this is stable. Improves benchmarks
across the board by a small percentage. Uncontested mutex acquire
and release in a loop becomes about 8% quicker.
- Minor cleanup.
for that file to not be built with profiling. This makes it reasonable to
use pthread_{set,get}specific() to implement thread-safe profiline call counts.