read/write-allocate line allocation policy.
On the i80321, this improves nearly every lmbench benchmark, dramatically
so the ones that are sensitive to memory bandwidth (100-300% improvement
for these).
cache, as well. The mini-data cache is 2-way, so src and dst won't
clobber each other, and the smallness of the cache doesn't matter,
since we access each page once sequentially.
While we still have to do the initial clean of the source page, this
saves another 4K of main D$ pollution, and also means we don't have
to do 2 cache passes after the copy is complete (i.e. we can skip the
invalidation of the source page in the main cache, since it's no longer
there).
vs. XScale. Use the mini-data cache for the destination on XScale,
thus saving tossing out 4K of possible-useful data from the main
data cache each time.
This significantly improves every test in lmbench.
and pte_l2_s_cache_mode. The cache-meaningful bits are different
for these descriptor types on some processor models.
* Add pte_*_cache_mask, corresponding to each above, which has a mask
of the cache-meangful bits, and define those for generic and XScale
MMU classes. Note, the L2_S_CACHE_MASK_xscale definition requires
use of the Extended Small Page L2 descriptor (the "X" bit overlaps
with AP bits otherwise).
L1_C_PROTO_xscale; while they are supposed to be set to 1 on generic
ARM MMUs (according to the SA-110 and ARM920T manuals), they are listed
as "should be zero" in the i80200 manual.
1. Generic (compatible with ARM6)
1. XScale (can be used as generic, but also has certainly nifty extensions).
Define abstract PTE bit defintions for each MMU class. If only one MMU
class is configured into the kernel (based on CPU_* options), then we
get the constants for that MMU class. Otherwise we indirect through
varaibles set up via set_cpufuncs().
XXX The XScale bits are currently the same as the generic bits. Baby steps.
Significant cleanup, here, including better PTE bit names.
* Add XScale PTE extensions (ECC enable, write-allocate cache mode).
* Mechanical changes everywhere else to update for new pte.h. While
doing this, two bugs (as a result of typos) were fixed in
arm/arm32/bus_dma.c
evbarm/integrator/int_bus_dma.c
the lower bits; UVM provides us page-aligned addresses for
everything. For the paranoid, we'll leave KDASSERT()'s in
that check for this if the kernel is built with DEBUG.
Low-hanging fruit that shaves some cycles.
pmap.h and give them more descriptive names and better comments:
* PT_M -> PVF_MOD (page is modified)
* PT_H -> PVF_REF (page is referenced)
* PT_W -> PVF_WIRED (mapping is wired)
* PT_Wr -> PVF_WRITE (mapping is writable)
* PT_NC -> PVF_NC (mapping is non-cacheable; multiple mappings)
* Don't refer to VA 0, instead refer to a new variable: vector_page
* Delete the old zero_page_*() functions, replacing them with a new
one: vector_page_setprot().
* When manipulating vector page mappings in user pmaps, only do so if
the vector page is below KERNEL_BASE (if it's above KERNEL_BASE, the
vector page is mapped by the kernel pmap).
* Add a new function, arm32_vector_init(), which takes the virtual
address of the vector page (which MUST be valid when the function
is called) and a bitmask of vectors the kernel is going to take
over, and performs all vector page initialization, including setting
the V bit in the CPU Control register ("relocate vectors to high
address"), if necessary.
when the part being quiried was mapped with a section (!) giving weird
results and had become a mess of goto's.
Complete rewrite and cleaned up the `goto'-jungle entirely ... ripped all
goto's. The resulting code is much better to read and might even have a
small performance gain.
I/O processors:
* The i80200 and the i80321 have the same CPU ID, so split the
CPU_XSCALE option into CPU_XSCALE_80200 and CPU_XSCALE_80321
options, and don't let them both be defined at the same time.
XXX May want to revisit this in the future.
* Split some registers common between the i80200 and i80321 into
<arm/xscale/xscalereg.h>.
* Rename a few existing functions.
pmap_copy_page() will never have any mappings. Therefore, it
is unnecessary to do a cache clean for that page.
Add assertions in #ifdef DEBUG that assert this invariant.
This shaves some cycles off the frequently-called pmap_zero_page()
and pmap_copy_page() (no need to look up the dst page's vm_page
structure, and one less function call to clean the page).