This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
the 4M super-section that the PTP will map, not some random 1M
chunk of it. This gives the PTP hint code a much better chance
to working properly, and allows us to tidy up the code that
flushes a PTP from the cache in pmap_destroy().
to do uncached memory access during VM operations (which can be
quite expensive on some CPUs).
We currently write-back PTEs as soon as they're modified; there is
some room for optimization (to write them back in larger chunks).
For PTEs in the APTE space (i.e. PTEs for pmaps that describe another
process's address space), PTEs must also be evicted from the cache
complete (PTEs in PTE space will be evicted durint a context switch).
* Save an instruction in the transition from idle to have-process-to-
switch-to, and eliminate two instructions that cause datadep-stalls
on StrongARM And XScale (one in each idle block).
* Rearrange some other instructions to avoid datadep-stalls on StrongARM
and XScale.
* Since cpu_do_powersave == 0 is by far the common case, avoid a
pipeline flush by reordering the two idle blocks.
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
(SA11x0 and PXA2x0).
This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
of the range are aligned to a cacheline boundary, when do a dcache-inv
operation, rather than a dcache-wbinv operation.
XXX It could be a little smarter (align using wbinv, inv, then finish
up using wbinv), but even this simple change is good for a nearly 40%
improvement in my test case on XScale.
map contains "coherent" (non-cached in ARM-land) mappings.
* Set ARM32_DMAMAP_COHERENT in the map at the start of a load operation,
and clear it in _bus_dmamap_load_buffer() if we encounter any cacheable
mappings.
* In _bus_dmamap_sync(), if the map is marked COHERENT, skip any cache
flushing.
cache line allocation policy on XScale CPUs: in pmap_enter(), if the
pmap is the kernel pmap, clear the X-bit in the PTE, thus disabling
read/write-allocate for managed kernel mappings.
Yes, this is ugly. But it makes userland code run with r/w-allocate,
which is a huge improvement on systems with low core memory performance.
This version works on both 26-bit and 32-bit machines. For large copies,
it's up to three times as fast as the old arm32 version and five times as
fast as the old arm26 version. For small copies it seems to be even faster
(getrusage() is apparently over ten times faster on an ARM610).
Hooray for Allen!
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
page tables.
- pmap_enter(): if making a mapping for the same PA rw->ro, write-back
the cache before doing so.
- pmap_clearbit(): if revoking REF on a page, make sure to wbinv the
cache if the page has write permission, else inv the cache if the page's
PTE is valid (XXX we actually wbinv in this case, as well, due to lack
of idcache_inv_range()). Only flush the TLB if the PTE changed.
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
proc0) so that we actually use the "exiting process" optimization in
cpu_switch().
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length. In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.
As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.
This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.
Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.
Changes are essentially mechanical. Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.
Discussed some time ago on port-arm.