Allocate/Prefetch one cache-line ahead of the one we're about to deal with.
This reduces the chances of the cpu stalling while waiting for the cache
to flush a dirty line in order to satisfy the Allocate/Prefetch request.
registers in any trap/interrupt exception frame found.
- Slight tweak to more accurately detect the correct call-site when
looking for a function's prologue.
explicitly in single file which implicitly needed it (altq_conf.c)
this avoids pulling in implicit dependency on <sys/conf.h> to every
file including <net/if.h> (which includes <altq/if_altq.h> to get altq
related structures)
tearing down a vm_map. use this to skip the pmap_update()
at the end of all the removes, which allows pmaps to optimize
pmap tear-down. also, use the new pmap_remove_all() hook to
let the pmap implemenation know what we're up to.
- use struct vm_page_md for attaching pv entries to struct vm_page
- change pseg_set()'s return value to indicate whether the spare page
was used as an L2 or L3 PTP.
- use a pool for pv entries instead of malloc().
- put PTPs on a list attached to the pmap so we can free them
more efficiently (by just walking the list) in pmap_destroy().
- use the new pmap_remove_all() interface to avoid flushing the cache and TLB
for each pmap_remove() that's done as we are tearing down an address space.
- in pmap_enter(), handle replacing an existing mapping more efficiently
than just calling pmap_remove() on it. also, skip flushing the
TSB and TLB if there was no previous mapping, since there can't be
anything we need to flush. also, preload the TSB if we're pre-setting
the mod/ref bits.
- allocate hardware contexts like the MIPS pmap:
allocate them all sequentially without reuse, then once we run out
just invalidate all user TLB entries and flush the entire L1 dcache.
- fix pmap_extract() for the case where the va is not page-aligned and
nothing is mapped there.
- fix calculation of TSB size. it was comparing physmem (which is
in units of pages) to constants that only make sense if they are
in units of bytes.
- avoid sleeping in pmap_enter(), instead let the caller do it.
- use pmap_kenter_pa() instead of pmap_enter() where appropriate.
- remove code to handle impossible cases in various functions.
- tweak asm code to pipeline a little better.
- remove many unnecessary spls and membars.
- lots of code cleanup.
- no doubt other stuff that I've forgotten.
the result of all this is that a fork+exit microbenchmark is 34% faster
and a fork+exec+exit microbenchmark is 28% faster.
cpu_idle() and new cpu_switch() to replace the old cpu_switch()
which did the lot. Runs leaner without overly blocking interrupts.
Includes cleanup of the RAS code to make use of callee-saved registers.
Benchmarks on DX4 @ 100MHz reveal a slight performance improvement
but probably not statistically signficant. More TBD to verify this.
Changes passed a pounding on Athlon @ 1GHz too.
This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.
Also provides an opportunity for optimisations if "switching to self".
Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.
All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.
rely on default value. It should actually be extracted
from the bootpath instead, but that involves translating
from apple partition map entries to netbsd disklabel entries.
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.
- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports
- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));
- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.
This is the last of the 'easy' ones that Krister made me aware of.
Total savings on i386 GENERIC kernel: 13151 bytes
RAIDframe in GENERIC is now at: 179033
Thanks again Krister!
are related to using libsa's alloc(). Problems go away with this alloc().
The problem is that the libsa alloc() assumes we can grab memory off
the end of the program. That assumption doesn't work for us. It's
much better to use the alloc() we were using as it calls OF_claim()
to get memory.
This ensures we start from the actual call site, not the return address.
The latter may actually be in the next consecutive function if the current
function has the __noreturn__ attribute and the alignment is Just Right.
pointer, in case the caller grew its stack dynamically.
Also beef up the checks to catch cases where the call stack passes
through the exception handling code in locore. In this case, the
frame pointer and program counter are in the trapframe/intrframe.
This eliminates problems where the underlying interrupt handler isn't the
specific layer calling scsipi_complete() for a given scsi transaction.
This avoids deadlocks where the kthread that called the autoconf routines
to configure a scsibus shouldn't be the one put to sleep waiting on a
scsipi_complete (only the scsibus's kthread should be doing that).
To avoid jitter this will force the scsibus's to probe in the order they
run through autoconf (so machines with multiple bus's don't move sd* devices
around on every reboot).