directly. The old code was totally bogus for the new pmap. New code
lifted from SH5 port.
Fixes panics in ffs_balloc_ufs2() seen while stress-testing a file
system on an XScale-based server platform.
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html
There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.
Fixes PR kern/21517.
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.
This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().
This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.
This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
for L2 allocation. This avoids potential recursive calls into
uvm_km_kmemalloc() via the pool allocator.
Bug spotted by Allen Briggs while trying to boot on a machine with 512MB
of memory.
* Define a new "MMU type", ARM_MMU_SA1. While the SA-1's MMU is basically
compatible with the generic, the SA-1 cache does not have a write-through
mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
compile time. We evaluate it like so:
- If SA-1-style MMU is the only type configured -> 1
- If SA-1-style MMU is not configured -> 0
- Otherwise, defer to a run-time variable.
If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
code can include the necessary run-time support. PMAP_INCLUDE_PTE_SYNC
largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
has a write-back cache. If so, init the PT cache mode to C=1,B=0 to get
write-through mode. Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8(). Old pmap, same as generic. New pmap,
sets page table cacheability to 0 (ARM8 has a write-back cache, but
flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
since we use write-through mode all the time on ARM9 right now. (What
this really tells me is that the test for write-through cache is less
than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1(). Old pmap, same as generic. New pmap,
does generic initialization, then resets page table cache mode to
C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
was checked in:
* It was not actually disabling the MMU, and so jumping to the
reset vector would happily cause a panic(), since it would be
the kernel's reset vector, not the ROM's.
* In the event the system was using high vectors, VECRELOC was not
getting cleared, which has the potential to wreak havoc when re-entering
the ROM.
* It was totally broken for CPUs < ARMv4; you still need to disable
the MMU on those, just need to skip the ARMv4 TLB flush.
* The code that was checked in would only work if the kernel is mapped
VA==PA. For systems where the kernel is NOT mapped VA==PA, you only
get the prefetch depth # of insns (2) after the MMU is turned off before
you have fix the PC.
Backing out the change fixes rebooting on several evbarm platforms.
requires that the CPU control vector be properly readable. I believe that
all CPUs that have high vector support have a readable CPU control register,
but if we ever encounter one that does not, then we'll have to adjust this
code.
Some features of the new pmap are:
- It allows L1 descriptor tables to be shared efficiently between
multiple processes. A typical "maxusers 32" kernel, where NPROC is set
to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
with just 4 L1s. This completely solves the problem of running out
of contiguous physical memory for allocating new L1s at runtime on a
busy system.
- Much improved cache/TLB management "smarts". This change ripples
out to encompass the low-level context switch code, which is also
much smarter about when to flush the cache/TLB, and when not to.
- Faster allocation of L2 page tables and associated metadata thanks,
in part, to the pool_cache enhancements recently contributed to
NetBSD by Wasabi Systems.
- Faster VM space teardown due to accurate referenced tracking of L2
page tables.
- Better/faster cache-alias tracking.
The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
to extract the physical address from the virtual.
On the ARM, also use the "read-only at MMU" indication to avoid a
redundant cache clean operation.
Other platforms should use these two as examples of how to use these
new pool/mbuf features to improve network performance. Note this requires
a platform to provide a working POOL_VTOPHYS().
Part 3 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
cache based on CPU id. write-through on PXA2[15]0 B2 stepping and
earlier. write-back on C0 and C1 stepping (a.k.a PXA2[15]5 A0)
options XSCALE_CACHE_WRITE_{THROUGH,BACK} can override it.
for other XScale CPUs than PXA2xx, XSCALE_CACHE_WRITE_THROUGH works
same as before.
too many bits (including some reserved ones) and was writing the wrong value
for the TLB flush.
Also, if the flag is off, don't write the control register!
int's to unsigned int/u_int where they shouldn't go negative.
int's to boolean_t's where they're being used as bools.
No real functional change (in the produced asm a few condition codes changed)
* Define an ARM_INTR_IMPL option, which specifies a header file
describing the interrupt implementation for the platform. Use
this instead of the list of EVBARM_BOARDTYPE checks.
* Make the s3c2xx0 interrupt dispatch code a bit more generic, and move
it to a generic location so that other platforms can use it.
This eliminates all uses of the EVBARM_BOARDTYPE stuff, so delete it.
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
when looking to reenable caching, only do so if all the pages aren't already
cached.
Convert some ints to unsigned int. (scarily this actually shows the biggest
decrease in timing for my benchmark, I guess the compiler can optimise better)
Use pmap_free_pvs in pmap_remove, should save on the overhead of freeing
each pv on it's own.
Correctly set ptp when calling pmap_enter_pv, this adds more overhead, but
the effect is minimal. Timings show that it increases gmake's make configure
step from 2:07.90 to 2:08.90. I've more optimisations planned that should
negate this increase.