before copying them out, rather just use a single one. Further, follow
the example of tmpfs and others by simply allocating on the stack.
This should have the side-effect of silencing false Coverity reports like
CID 4559 and 4554.
- sysmon_envsys_register: use a SLIST to maintain allocated sme_event_drv_t
structs for later use in sysmon_task_queue_sched(). This avoids a
locking error acquiring/dropping the mutex multiple times.
Suggested by rmind.
the limit in some LM chips.
- Add the lm_detach() function that stops/destroys the callout and
unregisters the driver from the sysmon_envsys(9) framework.
disabled, because some drivers depend in the old code yet.
- Use a new mutex for sysmon_envsys_next_sensor_index (used in
compatibility code), otherwise bad things happen with LKMs. Thanks
to this, the hack for LKMs has been removed.
- Check in advance if the driver doesn't exist already on the list
before adding the sensors in the dictionary.
- Don't forget to call sme_event_unregister_all() if
sysmon_envsys_unregister() fails after adding the array into the global
dictionary or when creating sensors.
- Modify and add some DPRINTFs.
The lm(4) lkm works without known problems registering and unregistering
it multiple times.
sysbench benchmark in read-only mode now scales to 900
simultaneous client threads on a 4xCPU i386 system before
serious performance drop-off occurs. [ad 20070907]
Instead, make the deferred wakeup list a per-thread array and pass down
the lwpid_t's that way.
- In pthread_cond_wait(), take the mutex before dealing with early wakeup.
In this way there should never be contention on the CV's spinlock if
the app follows POSIX rules (there should only be contention on the
user-provided mutex).
- Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if
there is contention. This is enabled where atomic ops are available. Right
now that is only i386 and amd64 because I don't have other hardware to
test with. It's trivial to add stubs for other architectures as long as
they have compare-and-swap. When we have proper atomic ops the old rwlock
code can be removed.
- Add a new mutex implementation that's similar to the kernel's mutexes, but
uses compare-and-swap to maintain the waiters list, so no spinlocks are
involved. Same caveats apply as for the rwlocks.
yielding. This is a nasty band-aid but with many threads, looping over
sched_yield() wastes a huge amount of CPU time. It would be nice to have a
way to temporarily disable preemption, but it turns out that's yet another
no-brain concept that has been patented and the patent holder seems to be
suing people lately. Another alternative is probably to have kernel-assisted
spinlocks.
by yamt@.
- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.
- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.
- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.