- Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np,
rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking
primitives and can be used when porting kernel code to userspace.
- Always create LWPs detached. Do join/exit sync mostly in userland. When
looped on a dual core box this seems ~30% quicker than using lwp_wait().
Reduce number of lock acquire/release ops during thread exit.
- Play scrooge again and chop more cycles off acquire/release.
- Spin while the lock holder is running on another CPU (adaptive mutexes).
- Do non-atomic release.
Threadreg:
- Add the necessary hooks to use a thread register.
- Add the code for i386, using %gs.
- Leave i386 code disabled until xen and COMPAT_NETBSD32 have the changes.
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
we are up and running and unfortunatley getenv() takes locks.
Other changes:
- Cache the spinlock vectors in pthread__st. Internal spinlock operations
now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
-falign-functions=32, since these two really get hammered on. To make them
faster needs a threadreg or TLS, unless there is a way to tell gcc that a
library-local (pthread__threadmask) variable does not need to be PIC.
preparing to sleep on a mutex are slow in relative terms, so this allows
us to recover from short lock holds without blocking, while not wasting
too much time on longer holds.
architecture to provide asm versions of the RAS operations.
We do this because relying on the compiler to get the RAS right is not
sensible. (It gets alpha wrong and hppa is suboptimal)
Provide asm RAS ops for hppa.
(A slightly different version) reviewed by Andrew Doran.
- Allow callers to try and release an unheld rwlock. Just return EPERM
as mandated by IEEE Std 1003.1.
- Use pthread__atomic_swap_ptr() to set in the new lock value. At this
point the lock word can't have changed.
pthread__rwlock_wrlock, pthread__rwlock_rdlock:
- Mask out the waiter bits in the lock word before checking to see if
the current thread is about to lock against itself.
the following do not wake other threads early:
pthread_mutex_lock(&mutex);
pthread_cond_broadcast(&cond);
foo = malloc(100); /* takes libc mutexes */
pthread_mutex_unlock(&mutex);
- Eliminate mutexattr_private and just set a bit in ptm_owner if the mutex
is recursive. This forces the slow path to be taken for recursive mutexes.
Overload an unused field in pthread_mutex_t to record whether or not it's
an errorcheck mutex.
- Streamline pthread_mutex_lock / pthread_mutex_unlock a bit more. As a
side effect makes it possible to have assembly stubs for them.