and hence save fewer into the PCB. This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly. While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.
This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
GCC produces almost exactly the same instructions as the hand-assembled
versions, albeit in a different order. It even found one place where it
could shave one off. Its insistence on creating a stack frame might slow
things down marginally, but not, I think, enough to matter.
in question, whereas the ARM code was using it to hold the model
identification. To fix this, rename:
ci_cpuid -> ci_arm_cpuid
ci_cputype -> ci_arm_cputype (for consistency)
ci_cpurev -> ci_arm_cpurev (ditto)
ci_cpunum -> ci_cpuid
This makes top(1) give correct CPU numbers in its "STATE" column (all 0 for
now).
in its own little inline function, and this allows us to get rid of all the
automatic variables elsewhere. This subtly changes the semantics of
__cpu_simple_lock() such that the loop ends up one instruction longer, but
I'm not sure that's a particularly bad thing.
and boot multi-user on a single-processor machine. Many of these changes
are wildly inappropriate for actual multi-processor operation, and correcting
this will be my next task.
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
to do uncached memory access during VM operations (which can be
quite expensive on some CPUs).
We currently write-back PTEs as soon as they're modified; there is
some room for optimization (to write them back in larger chunks).
For PTEs in the APTE space (i.e. PTEs for pmaps that describe another
process's address space), PTEs must also be evicted from the cache
complete (PTEs in PTE space will be evicted durint a context switch).
Change the bus_dmamap_sync() macro to test the ops argument against pre-
and post- constants. The compiler will optimize out dead code because
of the constants. Since post- operations are not needed on ARM (except
for ISA bounce buffers), this eliminate a large number of function calls
which are noops, each of which cost at least 6 cycles just in the call
and return overhead (not to mention whatever other useless work the
compiler decides to do in the callee).
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
(SA11x0 and PXA2x0).
This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
map contains "coherent" (non-cached in ARM-land) mappings.
* Set ARM32_DMAMAP_COHERENT in the map at the start of a load operation,
and clear it in _bus_dmamap_load_buffer() if we encounter any cacheable
mappings.
* In _bus_dmamap_sync(), if the map is marked COHERENT, skip any cache
flushing.
* Replace __byte_swap_32_variable() with a C version from Richard
Earnshaw that generates nearly identical assembly (and it would be
exactly identical with the addition of another peephole to GCC ARM
back-end).
* Byte-swap 16-bit and 32-bit constants at compile-time.
* Inline 16-bit and 32-bit variable byte-swaps. These take 3 and 4
insns, respectively, and inlining saves the minimum 6 cycle penalty
to call/return from the byte swap function.
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length. In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.
As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.
This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.
Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.
Changes are essentially mechanical. Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.
Discussed some time ago on port-arm.
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that. This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.
Tested on an i80321 -- changes to others are mechanical.
not actually be able to unblock the interrupt, which would cause us
to run the softclock interrupts with hardclock blocked.
Per discussion w/ Charles Hannum.