Commit Graph

78 Commits

Author SHA1 Message Date
lukem eca6fe7e73 fix spello in comment 2011-08-05 03:55:31 +00:00
matt def13dad30 Add __HAVE___LWP_GETTCB_FAST support (for mips and powerpc). 2011-03-17 00:43:48 +00:00
joerg 01eef02a1b If TLS support is present, use it for pthread__self(). The
initialisation order is correct in this case as _lwp_setprivate has been
called already by ld.elf_so for dynamic programs or _libc_init for
statically linked ones.
2011-03-16 12:39:44 +00:00
joerg aad599979d Add TLS support infrastructure. For dynamic binaries, ld.elf_so exports
_rtld_tls_allocate and _rtld_tls_free. libpthread uses this functions to
setup the thread private area of all new threads. ld.elf_so is
responsible for setting up the private area for the initial thread.
Similar functions are called from _libc_init for static binaries, using
dl_iterate_phdr to access the ELF Program Header.

Add test cases to exercise the different TLS storage models. Test cases
are compiled and installed on all platforms, but are skipped on
platforms not marked for TLS support.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.

It is inspired by the TLS support in FreeBSD by Doug Rabson and the
clean ups of the DragonFly port of the original FreeBSD modifications.
2011-03-09 23:10:05 +00:00
joerg b7b592d544 Back out using the thread register (if present) for now.
libgcc_s's __register_frame_info gets called from libc's CSU code before
the libc constructors are run. __register_frame_info in turn calls
pthread_mutex_lock. libpthread is not initialised at this point and
therefore pthread__self() traps when deferencing the thread register.
This worked before because the garbage from pthread__self() is
effectively ignored.
2011-02-25 14:32:38 +00:00
joerg 1631a78097 Allow storing and receiving the LWP private pointer via ucontext_t
on all platforms except VAX and IA64. Add fast access via register for
AMD64, i386 and SH3 ports. Use this fast access in libpthread to replace
the stack based pthread_self(). Implement skeleton support for Alpha,
HPPA, PowerPC, SPARC and SPARC64, but leave it disabled.

Ports that support this feature provide __HAVE____LWP_GETPRIVATE_FAST in
machine/types.h and a corresponding __lwp_getprivate_fast in
machine/mcontext.h.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.
2011-02-24 04:28:41 +00:00
christos 66f16a1fa6 I've had this patch in my tree for a while and since it only improves
the situation, I decided to commit it. There is an inherent problem
with ASLR and the way the pthread library is using the thread stack.

Our pthread library chooses that stack for each thread strategically
so that it can locate the location of the pthread struct for each
thread by masking the stack pointer and looking just below the red
zone it creates. Unfortunately with ASLR you get many random values
for the initial stack, and there are situations where the masked
stack base ends up below the base of the stack. (this happens on
x86 when the stack base happens to be 0x???02000 for example and
your mask is stackmask is 0xffe00000). To fix this, we detect the
pathological cases (this happens only in the main thread), allocate
more stack, and mprotect it appropriately. Then we stash the main
base and the main struct, so that when we look for the pthread
struct in pthread__id, we can special case the main thread.

Another way to work around the problem is unlimiting stacksize,
but the proper way is to use TLS to find the thread structure and
not to play games with the thread stacks.
2010-12-18 15:54:27 +00:00
ad 61cac435e4 - Convert from makecontext() -> _lwp_makecontext().
- Rely on _lwp_makecontext() to set up the thread identity register.
  This is not currently done (a bug), nor does libpthread use the
  threadreg yet. I'm doing this so it the code can be used by the
  person working on TLS to verify that their threadreg code is working.
2009-05-17 14:49:00 +00:00
ad a61915e94f Remove unused code that's confusing when using cscope/opengrok. 2009-05-16 22:20:40 +00:00
ad cbd43ffa55 Now that we have all the scheduling gunk, make these do something useful:
pthread_attr_get_np
pthread_attr_setschedparam
pthread_attr_getschedparam
pthread_attr_setschedpolicy
pthread_attr_getschedpolicy
2008-06-28 10:29:37 +00:00
ad 2bcb8bf1c4 PR lib/38741 priority inversion in libpthread breaks apps that use
SCHED_FIFO threads

- Change condvar sync so that we never take the condvar's spinlock without
  first holding the caller-provided mutex. Previously, the spinlock was only
  taken without the mutex in an error path, but it was enough to trigger the
  problem described in the PR.

- Even with this change, applications calling pthread_cond_signal/broadcast
  without holding the interlocking mutex are still subject to the problem
  described in the PR. POSIX discourages this saying that it leads to
  undefined scheduling behaviour, which seems good enough for the time being.

- Elsewhere, use a hash of mutexes instead of per-object spinlocks to
  synchronize entry/exit from sleep queues.

- Simplify how sleep queues are maintained.
2008-05-25 17:05:28 +00:00
martin ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
ad 377f098ab0 Adjust mutex/rwlock definitions to match reality now that there is only
one implementation of each. PR lib/38030.
2008-02-14 21:40:51 +00:00
ad a67e1e3475 - Remove libpthread's atomic ops.
- Remove the old spinlock-based mutex and rwlock implementations.
- Use the atomic ops from libc.
2008-02-10 18:50:54 +00:00
christos c6409540ef add missing static decls. 2008-01-08 20:56:08 +00:00
ad 622bbc505a - Use pthread__cancelled() in more places.
- pthread_join(): assert that pthread_cond_wait() returns zero.
2007-12-24 16:04:20 +00:00
ad 989565f81d - Fix pthread_rwlock_trywrlock() which was broken.
- Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np,
  rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking
  primitives and can be used when porting kernel code to userspace.

- Always create LWPs detached. Do join/exit sync mostly in userland. When
  looped on a dual core box this seems ~30% quicker than using lwp_wait().
  Reduce number of lock acquire/release ops during thread exit.
2007-12-24 14:46:28 +00:00
ad 8077340e63 Remove the debuglog stuff. ktrace is more useful now. 2007-11-19 15:14:11 +00:00
ad 66ac2ffaf2 Mutexes:
- Play scrooge again and chop more cycles off acquire/release.
- Spin while the lock holder is running on another CPU (adaptive mutexes).
- Do non-atomic release.

Threadreg:

- Add the necessary hooks to use a thread register.
- Add the code for i386, using %gs.
- Leave i386 code disabled until xen and COMPAT_NETBSD32 have the changes.
2007-11-13 17:20:08 +00:00
ad 15e9cec117 For PR bin/37347:
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
  we are up and running and unfortunatley getenv() takes locks.

Other changes:

- Cache the spinlock vectors in pthread__st. Internal spinlock operations
  now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
2007-11-13 15:57:10 +00:00
ad 84a6749ef2 Note that libpthread_dbg needs to be checked after making changes to
libpthread.
2007-10-16 15:21:54 +00:00
ad f1b2c1c4c9 ... but preserve the linked list, for the debugger only. 2007-10-16 15:07:02 +00:00
ad 9583eeb248 Replace the global thread list with a red-black tree. From joerg@. 2007-10-16 13:41:18 +00:00
skrll d32ed98975 Resurrect the function pointers for lock operations and allow each
architecture to provide asm versions of the RAS operations.

We do this because relying on the compiler to get the RAS right is not
sensible. (It gets alpha wrong and hppa is suboptimal)

Provide asm RAS ops for hppa.

(A slightly different version) reviewed by Andrew Doran.
2007-09-24 12:19:39 +00:00
ad 20e3392edc Add a per-mutex deferred wakeup flag so that threads doing something like
the following do not wake other threads early:

	pthread_mutex_lock(&mutex);
	pthread_cond_broadcast(&cond);
	foo = malloc(100);		/* takes libc mutexes */
	pthread_mutex_unlock(&mutex);
2007-09-13 23:51:47 +00:00
ad b0efccf4cd Make the new mutexes faster:
- Eliminate mutexattr_private and just set a bit in ptm_owner if the mutex
  is recursive. This forces the slow path to be taken for recursive mutexes.
  Overload an unused field in pthread_mutex_t to record whether or not it's
  an errorcheck mutex.
- Streamline pthread_mutex_lock / pthread_mutex_unlock a bit more. As a
  side effect makes it possible to have assembly stubs for them.
2007-09-11 18:11:29 +00:00
skrll 9fdaf800d9 Merge nick-csl-alignment. 2007-09-10 11:34:05 +00:00
ad f4fd6b797e - Get rid of self->pt_mutexhint and use pthread__mutex_owned() instead.
- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
  variables instead of doing the synchronization directly. Spinlocks
  are no longer used by the semaphore code.
2007-09-08 22:49:50 +00:00
ad 8ccc6e060d - Don't take the mutex's spinlock (ptr_interlock) in pthread_cond_wait().
Instead, make the deferred wakeup list a per-thread array and pass down
  the lwpid_t's that way.

- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
  In this way there should never be contention on the CV's spinlock if
  the app follows POSIX rules (there should only be contention on the
  user-provided mutex).

- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
  there is contention. This is enabled where atomic ops are available. Right
  now that is only i386 and amd64 because I don't have other hardware to
  test with. It's trivial to add stubs for other architectures as long as
  they have compare-and-swap. When we have proper atomic ops the old rwlock
  code can be removed.

- Add a new mutex implementation that's similar to the kernel's mutexes, but
  uses compare-and-swap to maintain the waiters list, so no spinlocks are
  involved. Same caveats apply as for the rwlocks.
2007-09-07 14:09:27 +00:00
ad a6ed47a549 Add: pthread__atomic_cas_ptr, pthread__atomic_swap_ptr, pthread__membar_full
This is a stopgap until the thorpej-atomic branch is complete.
2007-09-07 00:24:56 +00:00
ad d9adedd764 Trim fat off libpthread internal spinlock operations. Makes a mesurable
improvement across the board.
2007-08-16 13:54:16 +00:00
ad b8833ff53f - Reinitialize the absolute minimum when recycling user thread state.
Chops another ~10% off create/join in a loop on i386.
- Disable low level debugging as this is stable. Improves benchmarks
  across the board by a small percentage. Uncontested mutex acquire
  and release in a loop becomes about 8% quicker.
- Minor cleanup.
2007-08-16 12:01:49 +00:00
ad 9e28719960 Remove PT_FIXEDSTACKSIZE_LG. 2007-08-16 01:09:34 +00:00
ad ed964af19e Cache thread context for creation instead of setting it up every time.
Speeds create/join loop by about 10-15% on i386.
2007-08-16 00:41:23 +00:00
ad c3f8e2ee55 Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
2007-08-07 19:04:21 +00:00
ad 7bf06aa722 Make libpthread_dbg build again. 2007-08-04 18:54:12 +00:00
ad 50fa8db4e4 Some significant performance improvements, and a fix for a race with pthread
detach/join.

- Make mutex acquire spin for a short time, as done with spinlocks.
- Make the number of spins controllable with the env var PTHREAD_NSPINS.
- Reduce the amount of time that libpthread internal spinlocks are held.
- Rely more on the barrier effects of park/unpark to avoid taking spinlocks.
- Simplify the locking around pthreads and the global queues.
- Align per-thread sync data on a 128 byte boundary.
- Offset thread stacks by a small amount to try and reduce cache thrash.
2007-08-04 13:37:48 +00:00
ad b5a5e72af1 Mirror a fix made to the kernel's condvars:
After resuming execution, the thread must check to see if it
has been restarted as a result of pthread_cond_signal().  If it
has, but cannot take the wakeup (because of eg a pending Unix
signal or timeout) then try to ensure that another thread sees
it.  This is necessary because there may be multiple waiters,
and at least one should take the wakeup if possible.
2007-04-12 21:36:06 +00:00
ad a5070151ae - Test+branch is usually cheaper than making an indirect function call,
so avoid making them.
- When parking an LWP on a condition variable, point the hint argument at
  the mutex's waiters queue. Chances are we will be awoken from that later.
2007-03-24 18:51:59 +00:00
ad b0427b61fb - Maintain a per-thread pointer to the last mutex acquired by the app, to
be used only as as a hint. Clear the pointer when releasing the mutex.
- When releasing a mutex, wake all waiters. Makes it possible to tranfer
  waiters from another object to a mutex.
2007-03-20 23:33:10 +00:00
ad 792cc0e17d - Simplify the interface to pthread__park() and friends slightly.
- If sysctl() fails, complain.
2007-03-05 23:55:40 +00:00
ad de2138164c Remove the PTHREAD_SA option. If M:N threads is reimplemented it's
better off done with a seperate library.
2007-03-02 18:53:51 +00:00
ad d333bb5f2f Build without sys/sa.h present. 2007-02-06 15:24:37 +00:00
ad ded2602507 Fix bugs with and improve upon previous. 2006-12-24 18:39:45 +00:00
ad 1ac6a89b79 Conditionalised support for 1:1 threads. Needs associated kernel changes
and more work to be useful.
2006-12-23 05:14:46 +00:00
yamt 41cc94b9f0 remove unused IDLESPINS. 2006-10-03 09:37:07 +00:00
chs 0e67554241 starting the pthread library (ie. calling pthread__start()) before
any threads are created turned out to be not such a good idea.
there are stronger requirements on what has to work in a forked child
while a process is still single-threaded.  so take all that stuff
back out and fix the problems with single-threaded programs that
are linked with libpthread differently, by checking if the library
has been started and doing completely different stuff if it hasn't been:
 - for pthread_rwlock_timedrdlock(), just fail with EDEADLK immediately.
 - for sem_wait(), the only thing that can unlock the semaphore is a
   signal handler, so use sigsuspend() to wait for a signal.
 - for pthread_mutex_lock_slow(), just go into an infinite loop
   waiting for signals.

I also noticed that there's a "sem2" test that has never worked in its
single-threaded form.  the problem there is that a signal handler tries
to take a sem_t interlock which is already held when the signal is received.
fix this too, by adding a single-threaded case for sig_trywait() that
blocks signals instead of using the userland interlock.
2005-10-19 02:15:03 +00:00
chs 2415c56ed0 in pthread_mutex_lock_slow(), pthread_rwlock_timedrdlock() and sem_wait(),
call pthread__start() if it hasn't already been called.  this avoids
an internal assertion from the library if these routines are used
before any threads are created and they need to sleep.
fixes PR 20256, PR 24241, PR 25722, PR 26096.
2005-10-16 00:07:24 +00:00
nathanw 916de87872 Keep the kernel updated with signal action signal masks (act.sa_mask) until
threads are started, since before that the traditional signal invocation
method will be used. Fixes regress/lib/libpthread/sigmask2.
2005-02-26 20:33:06 +00:00
mycroft 2b4ccae3e9 Remove pt_blockuc. If the debugger attempts to muck with the state of a
blocked thread, return an error; this should be done through ptrace(2).
2004-10-12 22:17:56 +00:00