This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
/dev/ttyS0 crashed the kernel. This is because sacom_filltx uses some
uninitialized static variables. Pulling the salues from softc instead
fixes the problem (this is what was done before the drver was moved
from /sys/arch/hpcarm to /sys/arch/arm, anyway).
the 4M super-section that the PTP will map, not some random 1M
chunk of it. This gives the PTP hint code a much better chance
to working properly, and allows us to tidy up the code that
flushes a PTP from the cache in pmap_destroy().
to do uncached memory access during VM operations (which can be
quite expensive on some CPUs).
We currently write-back PTEs as soon as they're modified; there is
some room for optimization (to write them back in larger chunks).
For PTEs in the APTE space (i.e. PTEs for pmaps that describe another
process's address space), PTEs must also be evicted from the cache
complete (PTEs in PTE space will be evicted durint a context switch).
Change the bus_dmamap_sync() macro to test the ops argument against pre-
and post- constants. The compiler will optimize out dead code because
of the constants. Since post- operations are not needed on ARM (except
for ISA bounce buffers), this eliminate a large number of function calls
which are noops, each of which cost at least 6 cycles just in the call
and return overhead (not to mention whatever other useless work the
compiler decides to do in the callee).
* Save an instruction in the transition from idle to have-process-to-
switch-to, and eliminate two instructions that cause datadep-stalls
on StrongARM And XScale (one in each idle block).
* Rearrange some other instructions to avoid datadep-stalls on StrongARM
and XScale.
* Since cpu_do_powersave == 0 is by far the common case, avoid a
pipeline flush by reordering the two idle blocks.
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
(SA11x0 and PXA2x0).
This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
of the range are aligned to a cacheline boundary, when do a dcache-inv
operation, rather than a dcache-wbinv operation.
XXX It could be a little smarter (align using wbinv, inv, then finish
up using wbinv), but even this simple change is good for a nearly 40%
improvement in my test case on XScale.
condition on a branch (bpl where bhi should have been used). The error
caused one more line than intended to be flushed, which is particularly
bad if you're doing a dcache-invalidate operation.
map contains "coherent" (non-cached in ARM-land) mappings.
* Set ARM32_DMAMAP_COHERENT in the map at the start of a load operation,
and clear it in _bus_dmamap_load_buffer() if we encounter any cacheable
mappings.
* In _bus_dmamap_sync(), if the map is marked COHERENT, skip any cache
flushing.
* Replace __byte_swap_32_variable() with a C version from Richard
Earnshaw that generates nearly identical assembly (and it would be
exactly identical with the addition of another peephole to GCC ARM
back-end).
* Byte-swap 16-bit and 32-bit constants at compile-time.
* Inline 16-bit and 32-bit variable byte-swaps. These take 3 and 4
insns, respectively, and inlining saves the minimum 6 cycle penalty
to call/return from the byte swap function.
cache line allocation policy on XScale CPUs: in pmap_enter(), if the
pmap is the kernel pmap, clear the X-bit in the PTE, thus disabling
read/write-allocate for managed kernel mappings.
Yes, this is ugly. But it makes userland code run with r/w-allocate,
which is a huge improvement on systems with low core memory performance.
This version works on both 26-bit and 32-bit machines. For large copies,
it's up to three times as fast as the old arm32 version and five times as
fast as the old arm26 version. For small copies it seems to be even faster
(getrusage() is apparently over ten times faster on an ARM610).
Hooray for Allen!
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
page tables.
- pmap_enter(): if making a mapping for the same PA rw->ro, write-back
the cache before doing so.
- pmap_clearbit(): if revoking REF on a page, make sure to wbinv the
cache if the page has write permission, else inv the cache if the page's
PTE is valid (XXX we actually wbinv in this case, as well, due to lack
of idcache_inv_range()). Only flush the TLB if the PTE changed.
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
proc0) so that we actually use the "exiting process" optimization in
cpu_switch().
Unit. The AAU provides block fill, block copy, XOR, and XOR-parity-check
operations. We currently provide dmover(9) functions for "zero", "fill8",
and "copy".
Much of this code can be shared with the i80312 Companion I/O AAU, and
will be when support for the older chip is implemented.
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length. In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.
As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.
This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.
Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.
Changes are essentially mechanical. Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.
Discussed some time ago on port-arm.
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that. This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.
Tested on an i80321 -- changes to others are mechanical.
be properly used by any misc. cloning device. While here, correct
a comment to indicate that "open" is the only entry point and that
everything else is handled with fileops.
* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.
Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.
mappings bus_dma(9) states: "In the event that the DMA handle contains
a valid mapping, the mapping will be unloaded via the same mechanism
used by bus_dmamap_unload()." And some drivers do mean to skip the
unload step.
u_int16_t and int16_t for the X and Y count registers. GCC produces better
code this way.
Also, initialise the stored state in wsqms_enable(), so that the mouse doesn't
warp to a random position on open.
sc_flags was never read. G/C it.
wsqms_attach() took two arguments that pointed to the same structure. G/C one
of them
Since wsqms controls the same device as qms, have it match the same attach
args.
Use a callout rather than hanging off the VSYNC interrupt.
Don't emit WSMOUSE_INPUT_ABSOLUTE events, since this isn't an absolute device.
Handle counter wrap-around sensibly, rather than limiting counts.
Don't gratuitously copy sc->sc_dev onto itself at attach time.
the palette generation to work with arbitrary numbers of bits.
This allows X to work after a fashion, since it tries to put the VIDC into
a 6:5:5 mode itself (which we ignore). Anything that actually tries to take
advantage of the DirectColor visual it offers will still be screwed, but I
hope such applications are rare.
This time, vidcvideo_stdpalette() uses vidcvideo_write(), as it should, and
correctly initialises the paletter in 16bpp and (I hope) 32 bpp modes.
This fixes the colours on the text console in 16bpp modes. 32bpp seems to be
generally broken anyway.
to set it, and vidcprint isn't needed to print it. G/C all that code, and
most of the rest of vidcsearch too.
This also means that the locators on vidc's children are unused, so G/C them
as well.