walking the page tables whenever this information is needed.
Add an option PMAP_COUNT_DEBUG to assert the new counts and the
page table walk agree.
The old solution had very bad performance impact, for example
by the high CPU load when running top(1).
Thanks to Simon Burge for pointing at the cause of the problem and
to Valeriy E. Ushakov for optimizing my simple minded assembler code.
- use struct vm_page_md for attaching pv entries to struct vm_page
- change pseg_set()'s return value to indicate whether the spare page
was used as an L2 or L3 PTP.
- use a pool for pv entries instead of malloc().
- put PTPs on a list attached to the pmap so we can free them
more efficiently (by just walking the list) in pmap_destroy().
- use the new pmap_remove_all() interface to avoid flushing the cache and TLB
for each pmap_remove() that's done as we are tearing down an address space.
- in pmap_enter(), handle replacing an existing mapping more efficiently
than just calling pmap_remove() on it. also, skip flushing the
TSB and TLB if there was no previous mapping, since there can't be
anything we need to flush. also, preload the TSB if we're pre-setting
the mod/ref bits.
- allocate hardware contexts like the MIPS pmap:
allocate them all sequentially without reuse, then once we run out
just invalidate all user TLB entries and flush the entire L1 dcache.
- fix pmap_extract() for the case where the va is not page-aligned and
nothing is mapped there.
- fix calculation of TSB size. it was comparing physmem (which is
in units of pages) to constants that only make sense if they are
in units of bytes.
- avoid sleeping in pmap_enter(), instead let the caller do it.
- use pmap_kenter_pa() instead of pmap_enter() where appropriate.
- remove code to handle impossible cases in various functions.
- tweak asm code to pipeline a little better.
- remove many unnecessary spls and membars.
- lots of code cleanup.
- no doubt other stuff that I've forgotten.
the result of all this is that a fork+exit microbenchmark is 34% faster
and a fork+exec+exit microbenchmark is 28% faster.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.
Currently this is a no-op on most platforms, so they should see no difference.
Reviewed by Jason.
call overhead is incurred as we start sprinkling pmap_update() calls
throughout the source tree (no pmaps currently defer operations, but
we are adding the infrastructure to allow them to do so).
Remove the entire idea of fasttrap interrupts since V9 traps are really cheap,
the CPUs are really fast, and the completely different trap frames would make
these handlers really difficult to implement.
pmap_changeprot() was only used by the clock and one other place; deprecate it.
probeget() and probeset() now take 64-bit addresses even in 32-bit mode so we
can probe IO locations by physical addresses.
Some pmap cleanup.
Some more copyright cleanup.
u-area in machine-dependent code. Instead, call exit2() to schedule
the reaper to free them for us, once it is safe to do so (i.e. we are
no longer running on the dead proc's vmspace and stack).
Moved from a two level 512/1024 entry setup mapping 32 (9/10/13) bits
respectively to a three level 1024/1024/1024 entry setup mapping 43
(10/10/10/13) bits. In 32-bit mode we waste about 1/12 pages mapping the high
11 bits. We also only manage 43 of the possible 44 bits of virtual address
space, wasting half of it. Oh well, maybe we'll do better next revision.
Added genassym.cf
Removed lderr which should never have gotten in
Removed lots of dead code from locore.s
Added some softint stuff to intr.c
Added support for halt -p
esp and le both use bus_dmamap_*() functions now
instead of kdvma_mapin()
groundwork for PCI (but we still have no drivers for
any sun4u PCI devices)