de2138164c
better off done with a seperate library.
104 lines
4.2 KiB
Plaintext
104 lines
4.2 KiB
Plaintext
$NetBSD: TODO,v 1.8 2007/03/02 18:53:51 ad Exp $
|
|
|
|
Bugs to fix:
|
|
|
|
- Add locking to ld.elf_so so that multiple threads doing lazy binding
|
|
doesn't trash things. XXX Still the case?
|
|
- Verify the cancel stub symbol trickery.
|
|
|
|
Interfaces/features to implement:
|
|
|
|
- priority scheduling
|
|
- libc integration:
|
|
- foo_r interfaces
|
|
- system integration
|
|
- some macros and prototypes belong in headers other than pthread.h
|
|
|
|
Features that need more/better regression tests:
|
|
|
|
- pthread_cond_broadcast()
|
|
- pthread_once()
|
|
- pthread_get/setspecific()
|
|
- signals
|
|
|
|
Ideas to play with:
|
|
|
|
- Explore the trapcontext vs. usercontext distinction in ucontext_t.
|
|
|
|
- Get rid of thread structures when too many accumulate (is this
|
|
actually a good idea?)
|
|
|
|
- Currently, each thread uses two real pages of memory: one at the top
|
|
of the stack for actual stack data, and one at the bottom for the
|
|
pthread_st. If we can get suitable space above the initial stack for
|
|
main(), we can cut this to one page per thread. Perhaps crt0 should
|
|
do something different (give us more space) if libpthread is linked
|
|
in?
|
|
|
|
- Figure out whether/how to expose the inline version of
|
|
pthread_self().
|
|
|
|
- Along the same lines, figure out whether/how to use registers reserved
|
|
in the ABI for thread-specific-data to implement pthread_self().
|
|
|
|
- Figure out what to do with changing stack sizes.
|
|
|
|
- Stress testing, particularly with multiple CPUs.
|
|
|
|
- A race between pthread_exit() and pthread_create() for detached LWPs,
|
|
where the stack (and pthread structure) could be reclaimed before the
|
|
thread has a chance to call _lwp_exit(), is currently prevented by
|
|
checking the return of _lwp_kill(target, 0). It could be done more
|
|
efficiently. (See shared page item.)
|
|
|
|
- Adaptive mutexes and spinlocks (see shared page item). These need
|
|
to implement exponential backoff to reduce bus contention. On x86 we
|
|
need to issue the 'pause' instruction while spinning, perhaps on other
|
|
SMT processors too.
|
|
|
|
- Have a shared page that:
|
|
|
|
o Allows an LWP to request it not be preempted by the kernel. This would
|
|
be used over critical sections like pthread_cond_wait(), where we can
|
|
acquire a bunch of spin locks: being preempted while holding them would
|
|
suck. _lwp_park() would reset the flag once in kernel mode, and there
|
|
would need to be an equivalent way to do this from user mode. The user
|
|
path would probably need to notice deferred preemption and call
|
|
sched_yield() on exit from the critical section.
|
|
|
|
o Perhaps has some kind of hint mechanism that gives us a clue about
|
|
whether an LWP is currently running on another CPU. This could be used
|
|
for adaptive locks, but would need to be cheap to do in-kernel.
|
|
|
|
o Perhaps has a flag value that's reset when a detached LWP is into the
|
|
kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again,
|
|
may or may not be worth it.
|
|
|
|
- Keep a pool of dead LWPs so that we do not have take the full hit of
|
|
_lwp_create() every time pthread_create() is called. If nothing else
|
|
this is important for benchmarks.. There are a few different ways this
|
|
could be implemented, but it needs to be clear if the advantages are
|
|
real. Lots of thought and benchmarking required.
|
|
|
|
- LWPs that are parked or that have called nanosleep() (common) burn up
|
|
kernel resources. "struct lwp" itself isn't a big deal, but the VA space
|
|
and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer
|
|
in expectation that at some point we may be able to recycle the kernel
|
|
stack and re-start the LWP at the correct point, using pageable user
|
|
memory to hold state. It might also be useful to have a nanosleep call
|
|
that does something similar. Again, lots of thought and benchmarking
|
|
required. (Original idea from matt@)
|
|
|
|
- Need to give consideration to the order in which threads enter and exit
|
|
synchronisation objects, both in the pthread library and in the kernel.
|
|
Commonly locks are acquired/released in order (a, b, c -> c, b, a).
|
|
|
|
- The kernel scheduler needs improving to handle LWPs and processor affinity
|
|
better, and user space tools like top(1) and ps(1) need to be changed to
|
|
report correctly. Tied into that is the need for a mechanism to impose
|
|
limits on various aspects of LWPs.
|
|
|
|
- Streamlining of the park/unpark path.
|
|
|
|
- Priority inheritance and similar nasties.
|