- Update TCB for the initial thread in pthread__initthread, not
pthread__init to get it valid as soon as possible.
- Don't overwrite the pt_tls field in pthread__initthread.
- Don't deallocate pt_tls in pthread__scrubthread. This worked more by
chance than by design.
- Handle freeing the TLS area in pthread_create after removing the
thread instance from the dead queue.
_rtld_tls_allocate and _rtld_tls_free. libpthread uses this functions to
setup the thread private area of all new threads. ld.elf_so is
responsible for setting up the private area for the initial thread.
Similar functions are called from _libc_init for static binaries, using
dl_iterate_phdr to access the ELF Program Header.
Add test cases to exercise the different TLS storage models. Test cases
are compiled and installed on all platforms, but are skipped on
platforms not marked for TLS support.
This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.
It is inspired by the TLS support in FreeBSD by Doug Rabson and the
clean ups of the DragonFly port of the original FreeBSD modifications.
the situation, I decided to commit it. There is an inherent problem
with ASLR and the way the pthread library is using the thread stack.
Our pthread library chooses that stack for each thread strategically
so that it can locate the location of the pthread struct for each
thread by masking the stack pointer and looking just below the red
zone it creates. Unfortunately with ASLR you get many random values
for the initial stack, and there are situations where the masked
stack base ends up below the base of the stack. (this happens on
x86 when the stack base happens to be 0x???02000 for example and
your mask is stackmask is 0xffe00000). To fix this, we detect the
pathological cases (this happens only in the main thread), allocate
more stack, and mprotect it appropriately. Then we stash the main
base and the main struct, so that when we look for the pthread
struct in pthread__id, we can special case the main thread.
Another way to work around the problem is unlimiting stacksize,
but the proper way is to use TLS to find the thread structure and
not to play games with the thread stacks.
to reuse, because if we lose the race with the kernel we are never going
to reuse any elements. Look in the whole list instead.
XXX: should be pulled up to 5.x
- Rely on _lwp_makecontext() to set up the thread identity register.
This is not currently done (a bug), nor does libpthread use the
threadreg yet. I'm doing this so it the code can be used by the
person working on TLS to verify that their threadreg code is working.
implementation:
-don't return a difference, this can overflow
-don't try to substract typed pointers which don't belong to the
same object, this gives undefined results
This fixes instabilities of programs which use more than a handful
of threads, eg spuriously failing pthread_join().
XXX This must not be enabled by default because the LWP private mechanism
is reserved for TLS. It is provided only as a test/demo.
XXX Since ucontext_t does not contain the thread private variable, for a
short time after threads are created their thread specific data is unset.
If a signal arrives during that time we are screwed.
- No longer need pthread__osrev.
- Rearrange _lwp_ctl() calls slightly.
other systems. Allows poorly written applications to appear working. If you
are developing pthread apps please turn it on manually by setting the
environment variable.
SCHED_FIFO threads
- Change condvar sync so that we never take the condvar's spinlock without
first holding the caller-provided mutex. Previously, the spinlock was only
taken without the mutex in an error path, but it was enough to trigger the
problem described in the PR.
- Even with this change, applications calling pthread_cond_signal/broadcast
without holding the interlocking mutex are still subject to the problem
described in the PR. POSIX discourages this saying that it leads to
undefined scheduling behaviour, which seems good enough for the time being.
- Elsewhere, use a hash of mutexes instead of per-object spinlocks to
synchronize entry/exit from sleep queues.
- Simplify how sleep queues are maintained.
- Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np,
rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking
primitives and can be used when porting kernel code to userspace.
- Always create LWPs detached. Do join/exit sync mostly in userland. When
looped on a dual core box this seems ~30% quicker than using lwp_wait().
Reduce number of lock acquire/release ops during thread exit.
- Play scrooge again and chop more cycles off acquire/release.
- Spin while the lock holder is running on another CPU (adaptive mutexes).
- Do non-atomic release.
Threadreg:
- Add the necessary hooks to use a thread register.
- Add the code for i386, using %gs.
- Leave i386 code disabled until xen and COMPAT_NETBSD32 have the changes.
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
we are up and running and unfortunatley getenv() takes locks.
Other changes:
- Cache the spinlock vectors in pthread__st. Internal spinlock operations
now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
variables instead of doing the synchronization directly. Spinlocks
are no longer used by the semaphore code.
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.