map contains "coherent" (non-cached in ARM-land) mappings.
* Set ARM32_DMAMAP_COHERENT in the map at the start of a load operation,
and clear it in _bus_dmamap_load_buffer() if we encounter any cacheable
mappings.
* In _bus_dmamap_sync(), if the map is marked COHERENT, skip any cache
flushing.
* Replace __byte_swap_32_variable() with a C version from Richard
Earnshaw that generates nearly identical assembly (and it would be
exactly identical with the addition of another peephole to GCC ARM
back-end).
* Byte-swap 16-bit and 32-bit constants at compile-time.
* Inline 16-bit and 32-bit variable byte-swaps. These take 3 and 4
insns, respectively, and inlining saves the minimum 6 cycle penalty
to call/return from the byte swap function.
cache line allocation policy on XScale CPUs: in pmap_enter(), if the
pmap is the kernel pmap, clear the X-bit in the PTE, thus disabling
read/write-allocate for managed kernel mappings.
Yes, this is ugly. But it makes userland code run with r/w-allocate,
which is a huge improvement on systems with low core memory performance.
This version works on both 26-bit and 32-bit machines. For large copies,
it's up to three times as fast as the old arm32 version and five times as
fast as the old arm26 version. For small copies it seems to be even faster
(getrusage() is apparently over ten times faster on an ARM610).
Hooray for Allen!
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
page tables.
- pmap_enter(): if making a mapping for the same PA rw->ro, write-back
the cache before doing so.
- pmap_clearbit(): if revoking REF on a page, make sure to wbinv the
cache if the page has write permission, else inv the cache if the page's
PTE is valid (XXX we actually wbinv in this case, as well, due to lack
of idcache_inv_range()). Only flush the TLB if the PTE changed.
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
proc0) so that we actually use the "exiting process" optimization in
cpu_switch().
Unit. The AAU provides block fill, block copy, XOR, and XOR-parity-check
operations. We currently provide dmover(9) functions for "zero", "fill8",
and "copy".
Much of this code can be shared with the i80312 Companion I/O AAU, and
will be when support for the older chip is implemented.
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length. In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.
As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.
This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.
Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.
Changes are essentially mechanical. Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.
Discussed some time ago on port-arm.
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that. This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.
Tested on an i80321 -- changes to others are mechanical.
be properly used by any misc. cloning device. While here, correct
a comment to indicate that "open" is the only entry point and that
everything else is handled with fileops.
* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.
Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.
mappings bus_dma(9) states: "In the event that the DMA handle contains
a valid mapping, the mapping will be unloaded via the same mechanism
used by bus_dmamap_unload()." And some drivers do mean to skip the
unload step.
u_int16_t and int16_t for the X and Y count registers. GCC produces better
code this way.
Also, initialise the stored state in wsqms_enable(), so that the mouse doesn't
warp to a random position on open.
sc_flags was never read. G/C it.
wsqms_attach() took two arguments that pointed to the same structure. G/C one
of them
Since wsqms controls the same device as qms, have it match the same attach
args.
Use a callout rather than hanging off the VSYNC interrupt.
Don't emit WSMOUSE_INPUT_ABSOLUTE events, since this isn't an absolute device.
Handle counter wrap-around sensibly, rather than limiting counts.
Don't gratuitously copy sc->sc_dev onto itself at attach time.
the palette generation to work with arbitrary numbers of bits.
This allows X to work after a fashion, since it tries to put the VIDC into
a 6:5:5 mode itself (which we ignore). Anything that actually tries to take
advantage of the DirectColor visual it offers will still be screwed, but I
hope such applications are rare.
This time, vidcvideo_stdpalette() uses vidcvideo_write(), as it should, and
correctly initialises the paletter in 16bpp and (I hope) 32 bpp modes.
This fixes the colours on the text console in 16bpp modes. 32bpp seems to be
generally broken anyway.
to set it, and vidcprint isn't needed to print it. G/C all that code, and
most of the rest of vidcsearch too.
This also means that the locators on vidc's children are unused, so G/C them
as well.
- implement SIMPLEQ_REMOVE(head, elm, type, field). whilst it's O(n),
this mirrors the functionality of SLIST_REMOVE() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE()
- remove the unnecessary elm arg from SIMPLEQ_REMOVE_HEAD().
this mirrors the functionality of SLIST_REMOVE_HEAD() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE_HEAD()
- remove notes about SIMPLEQ not supporting arbitrary element removal
- use SIMPLEQ_FOREACH() instead of home-grown for loops
- use SIMPLEQ_EMPTY() appropriately
- use SIMPLEQ_*() instead of accessing sqh_first,sqh_last,sqe_next directly
- reorder manual page; be consistent about how the types are listed
- other minor cleanups
NULL for root PCI busses. For busses behind a bridge, it points to
a persistent copy of the bridge's pcitag_t. This can be very useful
for machine-dependent PCI bus enumeration code.
* Implement a machine-dependent pci_enumerate_bus() for sparc64 which
uses OFW device nodes to enumerate the bus. When a PCI bus that is
behind a bridge is attached, pci_attach_hook() allocates a new PCI
chipset tag for the new bus and sets it's "curnode" to the OFW node
of the bridge. This is used as a starting point when enumerating
that bus. Root busses get the OFW node of the host bridge (psycho).
* Garbage-collect "ofpci" and "ofppb" from the sparc64 port.
Also add correct locking when freeing pages in pmap_destroy (fix from potr)
This now means that arm32 kernels can be built with LOCKDEBUG enabled. (only tested on cats though)
not actually be able to unblock the interrupt, which would cause us
to run the softclock interrupts with hardclock blocked.
Per discussion w/ Charles Hannum.
until the footbridge is attached we still have to rely on a loop. This
uses TIMER_3 running at 100Hz.
Sadly this doesn't appear to fix the tlp problems, which either means that this
delay routine is not as accurate as it should/could be or tlp is still broken.
* pmap_protect(): write back the range when doing a r/w -> r/o
transition. (Still leave the block concerned with this in
pmap_clean_page() disabled, for now.)
* pmap_pte_init_xscale(): Disable read/write-allocate for now, until
we figure out why sometimes cache lines of NULs get deposited into
file data. Also, make sure ECC protection of page table access is
disabled for now.
* xscale_setup_minidata(): Make sure the mini-data cache is configured
write-back with read/write-allocate.
internally). Move arm/iomd/pms* to arm/iomd/opms*. Mechanical change,
tested by cross-compiling a kernel from i386.
Approved by christos.
XXX: What are arm/arm32/conf.c and arm/include/conf.h good for?
* Pull in dev/mii/files.mii from conf/files, rather than playing
the magic "files include order" dance in N machine-dependent
configuration definitions.
cache mode. Add a new XSCALE_CACHE_WRITE_THROUGH option for people who
are paranoid about the cache-related errata (you *do* have to line up
the planets correctly to trip them, but having the option is useful).
file, <arm/cpuconf.h>, which pulls in "opt_cputypes.h" and then defines
the following:
* CPU_NTYPES -- now many CPU types are configured into the kernel. What
you really want to know is "== 1" or "> 1".
* Defines ARM_ARCH_2, ARM_ARCH_3, ARM_ARCH_4, ARM_ARCH_5, depending
on which ARM architecture versions are configured (based on CPU_*
options). Also defines ARM_NARCH to determins how many architecture
versions are configured.
* Defines ARM_MMU_MEMC, ARM_MMU_GENERIC, ARM_MMU_XSCALE depending on
which classes of ARM MMUs are configured into the kernel, and ARM_NMMUS
to determine how many MMU classes are configured.
Remove the needless inclusion of "opt_cputypes.h" in several places.
Convert remaining users to <arm/cpuconf.h>.
read/write-allocate line allocation policy.
On the i80321, this improves nearly every lmbench benchmark, dramatically
so the ones that are sensitive to memory bandwidth (100-300% improvement
for these).
cache, as well. The mini-data cache is 2-way, so src and dst won't
clobber each other, and the smallness of the cache doesn't matter,
since we access each page once sequentially.
While we still have to do the initial clean of the source page, this
saves another 4K of main D$ pollution, and also means we don't have
to do 2 cache passes after the copy is complete (i.e. we can skip the
invalidation of the source page in the main cache, since it's no longer
there).
vs. XScale. Use the mini-data cache for the destination on XScale,
thus saving tossing out 4K of possible-useful data from the main
data cache each time.
This significantly improves every test in lmbench.
and pte_l2_s_cache_mode. The cache-meaningful bits are different
for these descriptor types on some processor models.
* Add pte_*_cache_mask, corresponding to each above, which has a mask
of the cache-meangful bits, and define those for generic and XScale
MMU classes. Note, the L2_S_CACHE_MASK_xscale definition requires
use of the Extended Small Page L2 descriptor (the "X" bit overlaps
with AP bits otherwise).
L1_C_PROTO_xscale; while they are supposed to be set to 1 on generic
ARM MMUs (according to the SA-110 and ARM920T manuals), they are listed
as "should be zero" in the i80200 manual.
1. Generic (compatible with ARM6)
1. XScale (can be used as generic, but also has certainly nifty extensions).
Define abstract PTE bit defintions for each MMU class. If only one MMU
class is configured into the kernel (based on CPU_* options), then we
get the constants for that MMU class. Otherwise we indirect through
varaibles set up via set_cpufuncs().
XXX The XScale bits are currently the same as the generic bits. Baby steps.
Significant cleanup, here, including better PTE bit names.
* Add XScale PTE extensions (ECC enable, write-allocate cache mode).
* Mechanical changes everywhere else to update for new pte.h. While
doing this, two bugs (as a result of typos) were fixed in
arm/arm32/bus_dma.c
evbarm/integrator/int_bus_dma.c
the lower bits; UVM provides us page-aligned addresses for
everything. For the paranoid, we'll leave KDASSERT()'s in
that check for this if the kernel is built with DEBUG.
Low-hanging fruit that shaves some cycles.
pmap.h and give them more descriptive names and better comments:
* PT_M -> PVF_MOD (page is modified)
* PT_H -> PVF_REF (page is referenced)
* PT_W -> PVF_WIRED (mapping is wired)
* PT_Wr -> PVF_WRITE (mapping is writable)
* PT_NC -> PVF_NC (mapping is non-cacheable; multiple mappings)
* Don't refer to VA 0, instead refer to a new variable: vector_page
* Delete the old zero_page_*() functions, replacing them with a new
one: vector_page_setprot().
* When manipulating vector page mappings in user pmaps, only do so if
the vector page is below KERNEL_BASE (if it's above KERNEL_BASE, the
vector page is mapped by the kernel pmap).
* Add a new function, arm32_vector_init(), which takes the virtual
address of the vector page (which MUST be valid when the function
is called) and a bitmask of vectors the kernel is going to take
over, and performs all vector page initialization, including setting
the V bit in the CPU Control register ("relocate vectors to high
address"), if necessary.
when the part being quiried was mapped with a section (!) giving weird
results and had become a mess of goto's.
Complete rewrite and cleaned up the `goto'-jungle entirely ... ripped all
goto's. The resulting code is much better to read and might even have a
small performance gain.
I/O processors:
* The i80200 and the i80321 have the same CPU ID, so split the
CPU_XSCALE option into CPU_XSCALE_80200 and CPU_XSCALE_80321
options, and don't let them both be defined at the same time.
XXX May want to revisit this in the future.
* Split some registers common between the i80200 and i80321 into
<arm/xscale/xscalereg.h>.
* Rename a few existing functions.
pmap_copy_page() will never have any mappings. Therefore, it
is unnecessary to do a cache clean for that page.
Add assertions in #ifdef DEBUG that assert this invariant.
This shaves some cycles off the frequently-called pmap_zero_page()
and pmap_copy_page() (no need to look up the dst page's vm_page
structure, and one less function call to clean the page).
incorporate write back support for processors not having a write through
cache.
The current fb_devconfig structure now really holds the device's
configuration and the softc really only holds the attachment information.
This used to be mixed giving rise to weird stuctures and cross references.
The number of vertical syncs before the video memory writeback is triggered
is configurable ... default is to wait for 5 Vsync .. aprox minumum 10
times a second, but more likely in the order of 12,5 times a second. When
printing is in progress no write back is performed... only after the
waiting time. The reasoning behind this is that as long as the screen is
printed too the cache will be purged of dirty data anyway due to the
processing and new screen memory useage.
on the port-acorn32's TODO list for quite some time :
- when the serial console is selected, don't exclude the screen
alltogether; currently the keyboard is still not attached but that might be
a configuration problem in the GENERIC console or a failure to explicitly
connect to a wsmux. This needs further investigation.
- create a framework for the display memory writeback on vsync for
StrongARM processors since they don't have a write-trough cache. This is to
solve the lazy screen update that is very evident in single user mode on
these processors; the cache isn't flushed/written back that often and parts
of the screen can thus be resident in the cache but not written out to
memory yet.
- clean up some loose ends in the code.
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
of the cache in a static table. Note that the table isn't complete --
contributions of cache details for CPUs whose data sheets I haven't got are
welcome.
become ippp (ISDN ppp) and irip (ISDN raw IP). The character device now
are called: /dev/isdn (isdnd <-> kernel communication), /dev/isdnctl (dialing
and other control), /dev/isdntrc* (tracing), /dev/isdnbchan* (raw B channel
access, i.e. for user land PPP) and /dev/isdntel* (telephone devices, i.e.
for answering machines).
issue an instruction that caused the late abort handler to be called for
wich the kernel had no support build in for.
It now only panics when it happends in kernel but otherwise signals the
process a SEGV signal.
but not used resulting in a compiler error. By splitting the declaration
and the initialisation this is solved.
Better would be to not even declare the flag when ARMFPE isnt enabled but
that would just add to the #ifdef jungle.
even bother probing for an FPA. If ARMFPE is configured, always use it,
even if there's an FPA (since it provides the FPA support code). Move all
printfs about FPAs into armfpe_init.c.
This means I can delete the last two elements from struct _cpu, so that the
structure, and the whole of <arm/cpus.h> is redundant and can be deleted.