to do uncached memory access during VM operations (which can be
quite expensive on some CPUs).
We currently write-back PTEs as soon as they're modified; there is
some room for optimization (to write them back in larger chunks).
For PTEs in the APTE space (i.e. PTEs for pmaps that describe another
process's address space), PTEs must also be evicted from the cache
complete (PTEs in PTE space will be evicted durint a context switch).
Change the bus_dmamap_sync() macro to test the ops argument against pre-
and post- constants. The compiler will optimize out dead code because
of the constants. Since post- operations are not needed on ARM (except
for ISA bounce buffers), this eliminate a large number of function calls
which are noops, each of which cost at least 6 cycles just in the call
and return overhead (not to mention whatever other useless work the
compiler decides to do in the callee).
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length. In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.
As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.
This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.
Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.
Changes are essentially mechanical. Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.
Discussed some time ago on port-arm.
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that. This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.
Tested on an i80321 -- changes to others are mechanical.
0xc0200000. Tidy up to remove dead comments and code.
Allow more than one L1 entry for the kernel space and use the 'spare'
memory below the kernel code for the initial page tables in the same
way that the iq80310 does.
Significant cleanup, here, including better PTE bit names.
* Add XScale PTE extensions (ECC enable, write-allocate cache mode).
* Mechanical changes everywhere else to update for new pte.h. While
doing this, two bugs (as a result of typos) were fixed in
arm/arm32/bus_dma.c
evbarm/integrator/int_bus_dma.c
* Don't refer to VA 0, instead refer to a new variable: vector_page
* Delete the old zero_page_*() functions, replacing them with a new
one: vector_page_setprot().
* When manipulating vector page mappings in user pmaps, only do so if
the vector page is below KERNEL_BASE (if it's above KERNEL_BASE, the
vector page is mapped by the kernel pmap).
* Add a new function, arm32_vector_init(), which takes the virtual
address of the vector page (which MUST be valid when the function
is called) and a bitmask of vectors the kernel is going to take
over, and performs all vector page initialization, including setting
the V bit in the CPU Control register ("relocate vectors to high
address"), if necessary.
physical memory start. Garbage-collect some cruft while here.
* Move the kernel up to 0xc0000000, giving a 1G/3G kernel/user split.
* Adjust the Integrator startup code accordingly.
Note that this has been compiled on some systems, cats, IQ80310, IPAQ, netwinder and shark (note that shark's build is currently broken due to other reasons), but only actually run on cats.
Shark doesn't make use of the functionality as I believe there has to be a correlation between OFW and the kernel tables so that calls into OFW work.
to the L1 table and a virtual address, and no pointer to the L2 table.
The L2 table will be looked up by pmap_map_entry(), which will panic
if the there is no L2 table for the requested VA.
NOTE: IT IS EXTREMELY IMPORTANT THAT THE CORRECT VIRTUAL ADDRESS
BE PROVIDED TO pmap_map_entry()! Notably, the code that mapped
the kernel L2 tables into the kernel PT mapping L2 table were not
passing actual virtual addresses, but rather offsets into the range
mapped by the L2 table. I have fixed up all of these call sites,
and tested the resulting kernel on both an IQ80310 and a Shark.
Other portmasters should examine their pmap_map_entry() calls if
their new kernels fail.
and let pmap_map_chunk() lookup the correct one to use for the
current VA. Eliminate the "l2table" argument to pmap_map_chunk().
Add a second L2 table for mapping kernel text/data/bss on the
IQ80310 (fixes booting kernels with ramdisks).
* Track which process (XXX really, vmspace) owns the mapping. When
we sync the map, if the mapping doesn't belong to the kernel or to
the current process (XXX really, vmspace), then no cache fobbing
is necessary, since the cache is Wb-Inv'd on context switch (XXX need
to revisit this when we support FCSE).
* Be smarter about which cache operation we do when sync'ing the map:
- PREREAD -- Invalidate D$ (XXX right now, we actually do Wb-Inv)
- PREWRITE -- Write-back D$ (note, we do NOT invalidate here)
- PREREAD|PREWRITE -- Wb-Inv D$
More work is needed here. In particular, a version for CPUs
with write-through caches should be provided, to eliminate
the write-back steps (which are noops on such CPUs, but skipping
two branches would be nice).
pass. Rather than providing a whole slew of cache operations that
aren't ever used, distill them down to some useful primitives:
icache_sync_all Synchronize I-cache
icache_sync_range Synchronize I-cache range
dcache_wbinv_all Write-back and Invalidate D-cache
dcache_wbinv_range Write-back and Invalidate D-cache range
dcache_inv_range Invalidate D-cache range
dcache_wb_range Write-back D-cache range
idcache_wbinv_all Write-back and Invalidate D-cache,
Invalidate I-cache
idcache_wbinv_range Write-back and Invalidate D-cache,
Invalidate I-cache range
Note: This does not yet include an overhaul of the actual asm files
that implement the primitives. Instead, we've provided a safe default
for each CPU type, and the individual CPU types can now be optimized
one at a time.
<arm/arm32/vmparam.h> (mostly the stuff that's tied to the pmap
implementation).
- Since the MMU definitions in pte.h are specific to ARM processors
that support 32-bit mode, move pte.h to <arm/arm32/pte.h>.
- Make the Netwinder startup file build again (use PT_B|PT_C, rather
than PT_CACHEABLE, since the latter expands to a variable these days).
On platforms which load the kernel sans symbols directly from firmware
(possibly in e.g. S-Record format), call ddb_init() with empty arguments,
so that it will search any compiled in SYMTAB_SPACE. On all other platforms,
if __ELF__, also call ddb_init() with empty arguments until ELF bootloaders
which pass symbol information are ready.