on all platforms except VAX and IA64. Add fast access via register for
AMD64, i386 and SH3 ports. Use this fast access in libpthread to replace
the stack based pthread_self(). Implement skeleton support for Alpha,
HPPA, PowerPC, SPARC and SPARC64, but leave it disabled.
Ports that support this feature provide __HAVE____LWP_GETPRIVATE_FAST in
machine/types.h and a corresponding __lwp_getprivate_fast in
machine/mcontext.h.
This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.
XXX This must not be enabled by default because the LWP private mechanism
is reserved for TLS. It is provided only as a test/demo.
XXX Since ucontext_t does not contain the thread private variable, for a
short time after threads are created their thread specific data is unset.
If a signal arrives during that time we are screwed.
- No longer need pthread__osrev.
- Rearrange _lwp_ctl() calls slightly.
As the size of mcontext is not changed, we avoid the hassle of
versioning all the calls that use it.
_REG_EXPEVT was never used by any code in the tree. Reporting EXPEVT
makes sense only for signals and in that case we pass it to userland
in ksi_trap already which is official MI way to get this (MD) information.
Old binaries running on new kernels will now have their GBR set from
new mcontext, but that's ok too, as GBR was not properly supported by
old kernels (not saved in trapframe), so old binaries couldn't have
possibly used it anyway.
- Play scrooge again and chop more cycles off acquire/release.
- Spin while the lock holder is running on another CPU (adaptive mutexes).
- Do non-atomic release.
Threadreg:
- Add the necessary hooks to use a thread register.
- Add the code for i386, using %gs.
- Leave i386 code disabled until xen and COMPAT_NETBSD32 have the changes.
- Override __libc_thr_init() instead of using our own constructor.
- Add pthread__getenv() and use instead of getenv(). This is used before
we are up and running and unfortunatley getenv() takes locks.
Other changes:
- Cache the spinlock vectors in pthread__st. Internal spinlock operations
now take 1 function call instead of 3 (i386).
- Use pthread__self() internally, not pthread_self().
- Use __attribute__ ((visibility("hidden"))) in some places.
- Kill PTHREAD_MAIN_DEBUG.
architecture to provide asm versions of the RAS operations.
We do this because relying on the compiler to get the RAS right is not
sensible. (It gets alpha wrong and hppa is suboptimal)
Provide asm RAS ops for hppa.
(A slightly different version) reviewed by Andrew Doran.
- Update some comments and fix minor bugs. Minor cosmetic changes.
- Replace some spinlocks with mutexes and rwlocks.
- Change the process private semaphores to use mutexes and condition
variables instead of doing the synchronization directly. Spinlocks
are no longer used by the semaphore code.
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
locations:
- Don't declare pthread__switch_away global
- Do the PIC dance for pthread__switch_return_point and
pthread__locked_switch. Ideally these (and other) symbols would
be hidden.
Thanks to uwe@, dyoung@ and elad@ for help.
XXX sh3 is still to be done.
XXX vax does strange things.
is set, so that the single-step trap happens in the thread's context
and not in the middle of _setcontext_u.
XXX might be able to do something here with iret, too, but it needs
more testing.