* Define a new "MMU type", ARM_MMU_SA1. While the SA-1's MMU is basically
compatible with the generic, the SA-1 cache does not have a write-through
mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
compile time. We evaluate it like so:
- If SA-1-style MMU is the only type configured -> 1
- If SA-1-style MMU is not configured -> 0
- Otherwise, defer to a run-time variable.
If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
code can include the necessary run-time support. PMAP_INCLUDE_PTE_SYNC
largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
has a write-back cache. If so, init the PT cache mode to C=1,B=0 to get
write-through mode. Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8(). Old pmap, same as generic. New pmap,
sets page table cacheability to 0 (ARM8 has a write-back cache, but
flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
since we use write-through mode all the time on ARM9 right now. (What
this really tells me is that the test for write-through cache is less
than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1(). Old pmap, same as generic. New pmap,
does generic initialization, then resets page table cache mode to
C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
was checked in:
* It was not actually disabling the MMU, and so jumping to the
reset vector would happily cause a panic(), since it would be
the kernel's reset vector, not the ROM's.
* In the event the system was using high vectors, VECRELOC was not
getting cleared, which has the potential to wreak havoc when re-entering
the ROM.
* It was totally broken for CPUs < ARMv4; you still need to disable
the MMU on those, just need to skip the ARMv4 TLB flush.
* The code that was checked in would only work if the kernel is mapped
VA==PA. For systems where the kernel is NOT mapped VA==PA, you only
get the prefetch depth # of insns (2) after the MMU is turned off before
you have fix the PC.
Backing out the change fixes rebooting on several evbarm platforms.
requires that the CPU control vector be properly readable. I believe that
all CPUs that have high vector support have a readable CPU control register,
but if we ever encounter one that does not, then we'll have to adjust this
code.
Some features of the new pmap are:
- It allows L1 descriptor tables to be shared efficiently between
multiple processes. A typical "maxusers 32" kernel, where NPROC is set
to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
with just 4 L1s. This completely solves the problem of running out
of contiguous physical memory for allocating new L1s at runtime on a
busy system.
- Much improved cache/TLB management "smarts". This change ripples
out to encompass the low-level context switch code, which is also
much smarter about when to flush the cache/TLB, and when not to.
- Faster allocation of L2 page tables and associated metadata thanks,
in part, to the pool_cache enhancements recently contributed to
NetBSD by Wasabi Systems.
- Faster VM space teardown due to accurate referenced tracking of L2
page tables.
- Better/faster cache-alias tracking.
The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
page: -1 for error, 0 for EOF, 1 otherwise. Inspired by an OpenBSD commit
message, pointed out by Miod Vallat in private mail.
vax/mba/hp.c: check return value <= 0, not < 0 to be concistent with how
other places handle return values from bounds_check_with_label().
to extract the physical address from the virtual.
On the ARM, also use the "read-only at MMU" indication to avoid a
redundant cache clean operation.
Other platforms should use these two as examples of how to use these
new pool/mbuf features to improve network performance. Note this requires
a platform to provide a working POOL_VTOPHYS().
Part 3 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
objects. Clients of the pool_cache API must consistently use
the "paddr" variants or not, otherwise behavior is undefined.
Enable this on Alpha, ARM, MIPS, and x86. Other platforms must
define POOL_VTOPHYS() in the appropriate manner in order to enable
the feature.
Part 1 of a series of simple patches contributed by Wasabi Systems
to improve network performance.
cache based on CPU id. write-through on PXA2[15]0 B2 stepping and
earlier. write-back on C0 and C1 stepping (a.k.a PXA2[15]5 A0)
options XSCALE_CACHE_WRITE_{THROUGH,BACK} can override it.
for other XScale CPUs than PXA2xx, XSCALE_CACHE_WRITE_THROUGH works
same as before.
* The offset format was wrong.
* There is no post-increment or index register update.
* It wasn't even matching because the mask was wrong.
Also touch up ldr/str disassembly slightly.
too many bits (including some reserved ones) and was writing the wrong value
for the TLB flush.
Also, if the flag is off, don't write the control register!
indirect configuration (because the BECC is a soft-core, it could
have a variety of peripherals in the FPGA). Also add support for
local untranslated DMA.