add rd, pc, #foo - . - 8 -> adr rd, foo
ldr rd, [pc, #foo - . - 8] -> ldr rd, foo
Also, when saving the return address for a function pointer call, use
"mov lr, pc" just before the call unless the return address is somewhere
other than just after the call site.
Finally, a few obvious little micro-optimisations like using LDR directly
rather than ADR followed by LDR, and loading directly into PC rather than
bouncing via R0.
Change the bus_dmamap_sync() macro to test the ops argument against pre-
and post- constants. The compiler will optimize out dead code because
of the constants. Since post- operations are not needed on ARM (except
for ISA bounce buffers), this eliminate a large number of function calls
which are noops, each of which cost at least 6 cycles just in the call
and return overhead (not to mention whatever other useless work the
compiler decides to do in the callee).
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
Unit. The AAU provides block fill, block copy, XOR, and XOR-parity-check
operations. We currently provide dmover(9) functions for "zero", "fill8",
and "copy".
Much of this code can be shared with the i80312 Companion I/O AAU, and
will be when support for the older chip is implemented.
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length. In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.
As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.
This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.
Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that. This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.
Tested on an i80321 -- changes to others are mechanical.
NULL for root PCI busses. For busses behind a bridge, it points to
a persistent copy of the bridge's pcitag_t. This can be very useful
for machine-dependent PCI bus enumeration code.
* Implement a machine-dependent pci_enumerate_bus() for sparc64 which
uses OFW device nodes to enumerate the bus. When a PCI bus that is
behind a bridge is attached, pci_attach_hook() allocates a new PCI
chipset tag for the new bus and sets it's "curnode" to the OFW node
of the bridge. This is used as a starting point when enumerating
that bus. Root busses get the OFW node of the host bridge (psycho).
* Garbage-collect "ofpci" and "ofppb" from the sparc64 port.