* Define a new "MMU type", ARM_MMU_SA1. While the SA-1's MMU is basically
compatible with the generic, the SA-1 cache does not have a write-through
mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
compile time. We evaluate it like so:
- If SA-1-style MMU is the only type configured -> 1
- If SA-1-style MMU is not configured -> 0
- Otherwise, defer to a run-time variable.
If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
code can include the necessary run-time support. PMAP_INCLUDE_PTE_SYNC
largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
has a write-back cache. If so, init the PT cache mode to C=1,B=0 to get
write-through mode. Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8(). Old pmap, same as generic. New pmap,
sets page table cacheability to 0 (ARM8 has a write-back cache, but
flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
since we use write-through mode all the time on ARM9 right now. (What
this really tells me is that the test for write-through cache is less
than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1(). Old pmap, same as generic. New pmap,
does generic initialization, then resets page table cache mode to
C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
Some features of the new pmap are:
- It allows L1 descriptor tables to be shared efficiently between
multiple processes. A typical "maxusers 32" kernel, where NPROC is set
to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
with just 4 L1s. This completely solves the problem of running out
of contiguous physical memory for allocating new L1s at runtime on a
busy system.
- Much improved cache/TLB management "smarts". This change ripples
out to encompass the low-level context switch code, which is also
much smarter about when to flush the cache/TLB, and when not to.
- Faster allocation of L2 page tables and associated metadata thanks,
in part, to the pool_cache enhancements recently contributed to
NetBSD by Wasabi Systems.
- Faster VM space teardown due to accurate referenced tracking of L2
page tables.
- Better/faster cache-alias tracking.
The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
and hence save fewer into the PCB. This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly. While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.
This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
and boot multi-user on a single-processor machine. Many of these changes
are wildly inappropriate for actual multi-processor operation, and correcting
this will be my next task.
Significant cleanup, here, including better PTE bit names.
* Add XScale PTE extensions (ECC enable, write-allocate cache mode).
* Mechanical changes everywhere else to update for new pte.h. While
doing this, two bugs (as a result of typos) were fixed in
arm/arm32/bus_dma.c
evbarm/integrator/int_bus_dma.c
Note that this has been compiled on some systems, cats, IQ80310, IPAQ, netwinder and shark (note that shark's build is currently broken due to other reasons), but only actually run on cats.
Shark doesn't make use of the functionality as I believe there has to be a correlation between OFW and the kernel tables so that calls into OFW work.
enabled by definining __ARM_FIQ_INDIRECT in <machine/types.h>.
This is needed for OpenFirmware systems (like the Shark), where
the OFW vector page is used, and kernel entries merely patched
into it.
pass. Rather than providing a whole slew of cache operations that
aren't ever used, distill them down to some useful primitives:
icache_sync_all Synchronize I-cache
icache_sync_range Synchronize I-cache range
dcache_wbinv_all Write-back and Invalidate D-cache
dcache_wbinv_range Write-back and Invalidate D-cache range
dcache_inv_range Invalidate D-cache range
dcache_wb_range Write-back D-cache range
idcache_wbinv_all Write-back and Invalidate D-cache,
Invalidate I-cache
idcache_wbinv_range Write-back and Invalidate D-cache,
Invalidate I-cache range
Note: This does not yet include an overhaul of the actual asm files
that implement the primitives. Instead, we've provided a safe default
for each CPU type, and the individual CPU types can now be optimized
one at a time.
* Use a common set of exception handlers for all arm32 platforms.
* New FIQ framework based on discussions with Ben Harris, shared
between arm26 and arm32.