128 lines
5.6 KiB
Plaintext
128 lines
5.6 KiB
Plaintext
$NetBSD: TODO,v 1.5 2006/12/25 11:36:36 ad Exp $
|
|
|
|
Bugs to fix, mostly with SA:
|
|
|
|
- some blocking routines (like sem_wait()) don't work if SA's aren't
|
|
running yet, because the alarm system isn't up and running or there is no
|
|
thread context to switch to. It would be weird to use them that
|
|
way, but it's perfectly legal.
|
|
- There is a race between pthread_cancel() and
|
|
pthread_cond_broadcast() or pthread_exit() about removing an item
|
|
from the sleep queue. The locking protocols there need a little
|
|
adjustment.
|
|
- pthread_sig.c: pthread__kill_self() passes a bogus ucontext to the handler.
|
|
This is probably not very important.
|
|
- pthread_sig.c: Come up with a signal trampoline naming convention like
|
|
libc's, so that GDB will have an easier time with things.
|
|
- Consider moving pthread__signal_tramp() to its own file, and building
|
|
it with -fasync-unwind-tables, so that DWARF2 EH unwinding works through
|
|
it. (This is required for e.g. GCC's libjava.)
|
|
- Add locking to ld.elf_so so that multiple threads doing lazy binding
|
|
doesn't trash things.
|
|
- Verify the cancel stub symbol trickery.
|
|
|
|
|
|
Interfaces/features to implement:
|
|
- pthread_atfork()
|
|
- priority scheduling
|
|
- libc integration:
|
|
- foo_r interfaces
|
|
- system integration
|
|
- some macros and prototypes belong in headers other than pthread.h
|
|
|
|
|
|
Features that need more/better regression tests:
|
|
- pthread_cond_broadcast()
|
|
- pthread_once()
|
|
- pthread_get/setspecific()
|
|
- signals
|
|
|
|
|
|
Things that need fixing:
|
|
- Recycle dead threads for new threads.
|
|
|
|
Ideas to play with:
|
|
- Explore the trapcontext vs. usercontext distinction in ucontext_t.
|
|
- Get rid of thread structures when too many accumulate (is this
|
|
actually a good idea?)
|
|
- Adaptive spin/sleep locks for mutexes.
|
|
- Currently, each thread uses two real pages of memory: one at the top
|
|
of the stack for actual stack data, and one at the bottom for the
|
|
pthread_st. If we can get suitable space above the initial stack for
|
|
main(), we can cut this to one page per thread. Perhaps crt0 should
|
|
do something different (give us more space) if libpthread is linked
|
|
in?
|
|
- Figure out whether/how to expose the inline version of
|
|
pthread_self().
|
|
- Along the same lines, figure out whether/how to use registers reserved
|
|
in the ABI for thread-specific-data to implement pthread_self().
|
|
- Figure out what to do with changing stack sizes.
|
|
|
|
Future work for 1:1 threads:
|
|
|
|
- Stress testing, particularly with multiple CPUs.
|
|
|
|
- Verify that gdb still works well (basic functionality seems to be OK).
|
|
|
|
- There is a race between pthread_exit() and pthread_create() for
|
|
detached LWPs, where the stack (and pthread structure) could be reclaimed
|
|
before the thread has a chance to call _lwp_exit(). Checking the return
|
|
of _lwp_kill(target, 0) could be used to fix this but that seems a bit
|
|
heavyweight. (See shared page item.)
|
|
|
|
- Adaptive mutexes and spinlocks (see shared page item). These need
|
|
to implement exponential backoff to reduce bus contention. On x86 we
|
|
need to issue the 'pause' instruction while spinning, perhaps on other
|
|
SMT processors too.
|
|
|
|
- Have a shared page that:
|
|
|
|
o Allows an LWP to request it not be preempted by the kernel. This would
|
|
be used over critical sections like pthread_cond_wait(), where we can
|
|
acquire a bunch of spin locks: being preempted while holding them would
|
|
suck. _lwp_park() would reset the flag once in kernel mode, and there
|
|
would need to be an equivalent way to do this from user mode. The user
|
|
path would probably need to notice deferred preemption and call
|
|
sched_yield() on exit from the critical section.
|
|
|
|
o Perhaps has some kind of hint mechanism that gives us a clue about
|
|
whether an LWP is currently running on another CPU. This could be used
|
|
for adaptive locks, but would need to be cheap to do in-kernel.
|
|
|
|
o Perhaps has a flag value that's reset when a detached LWP is into the
|
|
kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again,
|
|
may or may not be worth it.
|
|
|
|
- Keep a pool of dead LWPs so that we do not have take the full hit of
|
|
_lwp_create() every time pthread_create() is called. If nothing else
|
|
this is important for benchmarks.. There are a few different ways this
|
|
could be implemented, but it needs to be clear if the advantages are
|
|
real. Lots of thought and benchmarking required.
|
|
|
|
- LWPs that are parked or that have called nanosleep() (common) burn up
|
|
kernel resources. "struct lwp" itself isn't a big deal, but the VA space
|
|
and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer
|
|
in expectation that at some point we may be able to recycle the kernel
|
|
stack and re-start the LWP at the correct point, using pageable user
|
|
memory to hold state. It might also be useful to have a nanosleep call
|
|
that does something similar. Again, lots of thought and benchmarking
|
|
required. (Original idea from matt@)
|
|
|
|
- It's possible that we don't need to take so many spinlocks around
|
|
cancellation points like pthread_cond_wait() given that _lwp_wakeup()
|
|
and _lwp_unpark() need to synchronise anyway.
|
|
|
|
- Need to give consideration to the order in which threads enter and exit
|
|
synchronisation objects, both in the pthread library and in the kernel.
|
|
Commonly locks are acquired/released in order (a, b, c -> c, b, a). The
|
|
pthread spec probably has something to say about this.
|
|
|
|
- The kernel scheduler needs improving to handle LWPs and processor affinity
|
|
better, and user space tools like top(1) and ps(1) need to be changed to
|
|
report correctly. Tied into that is the need for a mechanism to impose
|
|
limits on various aspects of LWPs.
|
|
|
|
- Streamlining of the park/unpark path.
|
|
|
|
- Priority inheritance and similar nasties.
|