Commit Graph

295 Commits

Author SHA1 Message Date
briggs 20267a208f Do not trim 'offset' from 'len' in _bus_dmamap_sync_linear(). 2002-08-17 05:14:10 +00:00
briggs d86c947b8c Inline bus_dma_inrange() and bus_dmamap_sync_*(). 2002-08-17 01:15:15 +00:00
thorpej 50fe583069 Must ... micro ... optimize!
* Save an instruction in the transition from idle to have-process-to-
  switch-to, and eliminate two instructions that cause datadep-stalls
  on StrongARM And XScale (one in each idle block).
* Rearrange some other instructions to avoid datadep-stalls on StrongARM
  and XScale.
* Since cpu_do_powersave == 0 is by far the common case, avoid a
  pipeline flush by reordering the two idle blocks.
2002-08-17 01:08:21 +00:00
thorpej ebff575bc3 * Add a new machdep.powersave sysctl, which controls the use of
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
  (SA11x0 and PXA2x0).

This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
2002-08-16 15:25:53 +00:00
briggs fa81e3d75e * Use local label names (.Lfoo vs. (Lfoo or foo))
* When moving from cpsr, use "cpsr" instead of "cpsr_all" (which is
   provided, but doesn't make sense since mrs doesn't support fields
   like msr does).
2002-08-15 01:37:01 +00:00
thorpej ad73349331 We only need to modify the CPSR's control field, so use cpsr_c rather
than cpsr_all.
2002-08-14 23:23:06 +00:00
chris f4c605201d Tweak asm to avoid a couple of stalls. 2002-08-14 23:07:36 +00:00
thorpej b45159bad0 When doing PREREAD sync operations, if the start and end addresses
of the range are aligned to a cacheline boundary, when do a dcache-inv
operation, rather than a dcache-wbinv operation.

XXX It could be a little smarter (align using wbinv, inv, then finish
up using wbinv), but even this simple change is good for a nearly 40%
improvement in my test case on XScale.
2002-08-14 22:56:55 +00:00
briggs a957deca48 G/c cowfault. 2002-08-14 21:52:36 +00:00
thorpej 203dd6b325 * Add an ARM32_DMAMAP_COHERENT flag to indicate that a loaded DMA
map contains "coherent" (non-cached in ARM-land) mappings.
* Set ARM32_DMAMAP_COHERENT in the map at the start of a load operation,
  and clear it in _bus_dmamap_load_buffer() if we encounter any cacheable
  mappings.
* In _bus_dmamap_sync(), if the map is marked COHERENT, skip any cache
  flushing.
2002-08-14 20:50:37 +00:00
thorpej d00a4a068d Whe making a mapping "coherent", clear *ALL* the cache bits, not
just L2_B and L2_C.
2002-08-14 19:21:50 +00:00
thorpej 98d6ec0b89 Add the brutal hack that allows us to limp along using the read/write
cache line allocation policy on XScale CPUs: in pmap_enter(), if the
pmap is the kernel pmap, clear the X-bit in the PTE, thus disabling
read/write-allocate for managed kernel mappings.

Yes, this is ugly.  But it makes userland code run with r/w-allocate,
which is a huge improvement on systems with low core memory performance.
2002-08-13 03:36:30 +00:00
thorpej d7be866fc8 Rearrange the beginning of cpu_switch() slightly to reduce data-dep
stalls on StrongARM and XScale.
2002-08-12 21:00:12 +00:00
bjh21 664bea62e3 __KERNEL_RCSID 2002-08-12 20:19:04 +00:00
bjh21 ca86069053 When pcb_onfault is set, pass the error code we get from uvm_fault()
(or EFAULT if we never called uvm_fault) to the onfault handler in R0,
in case it wants to use it.
2002-08-12 20:17:37 +00:00
thorpej 3d6f9f69ab Make a slight tweak to register usage to save an instruction. 2002-08-12 19:33:01 +00:00
bjh21 657216ff0f Remove a file which was accidentally resurrected. 2002-08-11 23:20:11 +00:00
bjh21 206c97ccc2 Move the arm32 copystr.S from arch/arm/arm32 to arch/arm/arm and add support
for 26-bit modes (basically saving R14 when we might get a page fault).
Use it on all ARM architectures now.
2002-08-11 23:17:24 +00:00
bjh21 b6228a7d06 New, improved version of copyin(), copyout(), and kcopy() by Allen Briggs.
This version works on both 26-bit and 32-bit machines.  For large copies,
it's up to three times as fast as the old arm32 version and five times as
fast as the old arm26 version.  For small copies it seems to be even faster
(getrusage() is apparently over ten times faster on an ARM610).

Hooray for Allen!
2002-08-11 21:19:12 +00:00
thorpej 76730bd0cc Tidy up pmap_clean_page() a little, and reenable some code that was
disabled previously: Skip cleaning mappings which are read-only, because
the pmap (now) does clean pages on a r/w -> r/o transition.
2002-08-10 00:48:35 +00:00
thorpej 006a578742 Clean up some warts in pmap_protect(). 2002-08-10 00:11:51 +00:00
thorpej 15a5e8f238 cpu_fork(): If PMCs are not enabled in the parent, clear the machine-
dependent PMC state in the child.
2002-08-09 23:44:17 +00:00
thorpej 6ce0a206cc Add an XSCALE_CACHE_READ_WRITE_ALLOCATE option for people who
want to play fast-and-loose.
2002-08-09 21:49:09 +00:00
thorpej 884bc64586 Add some code, conditional on PMAP_ALIAS_DEBUG, that can be used to
hunt for virtual aliases between managed (pmap_enter) and non-managed
(pmap_kenter_pa) mappings.
2002-08-09 18:22:59 +00:00
thorpej c979315325 Reduce stalls on StrongARM and XScale by waiting one insn before using
the result of a load.
2002-08-09 06:18:24 +00:00
thorpej afe3274eed Use ldrbt/strbt. Some other random cleanup. 2002-08-09 06:03:02 +00:00
thorpej 410785d6f0 Use ldrt/strt. 2002-08-09 04:13:20 +00:00
thorpej fdcc8560e4 Speed up bcopy_page() on the XScale slightly by using the "pld"
insn (prefetch) to look-ahead to the next chunk while we copy the
current chunk.

This could probably use a bit more tuning.
2002-08-07 16:21:29 +00:00
briggs 0b956d0b8b Implement pmc(9) -- An interface to hardware performance monitoring
counters.  These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface.  Initially, the Intel XScale
counters are the only ones supported.
2002-08-07 05:14:47 +00:00
thorpej 26bc8b27f4 - pmap_remove(): unmap the PTEs *after* we have finished with the
page tables.
- pmap_enter(): if making a mapping for the same PA rw->ro, write-back
  the cache before doing so.
- pmap_clearbit(): if revoking REF on a page, make sure to wbinv the
  cache if the page has write permission, else inv the cache if the page's
  PTE is valid (XXX we actually wbinv in this case, as well, due to lack
  of idcache_inv_range()).  Only flush the TLB if the PTE changed.
2002-08-06 21:43:51 +00:00
thorpej 0886c8cc0f Rearrange the exit path so that we don't do a idcache_wbinv_all *twice*
when a process exits.
2002-08-06 19:20:29 +00:00
thorpej 62d83d05b1 * Pass proc0 to switch_exit(), to make this a little more like the
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
  proc0) so that we actually use the "exiting process" optimization in
  cpu_switch().
2002-08-06 17:44:35 +00:00
thorpej f7328ddbe7 Add dmoverio. 2002-08-02 00:50:25 +00:00
thorpej dce4476374 Overhaul how DMA ranges work in the ARM bus_dma implementation.
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length.  In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.

As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.

This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.

Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
2002-07-31 17:34:23 +00:00
thorpej 79af00bddb Move the calls to uvm_page_physload() out of pmap_bootstrap() and
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.

Changes are essentially mechanical.  Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.

Discussed some time ago on port-arm.
2002-07-31 00:20:51 +00:00
thorpej d3aa5664b7 Move the uvm_setpagesize() call to platform-dependent code in preparation
for other changes to pmap_bootstrap().
2002-07-30 16:16:38 +00:00
thorpej 3dcad9ac9e Don't use pmap_kenter_pa() in pmap_map(); doing so causes an assertion
failure in pmap_kenter_pa().
2002-07-30 16:07:23 +00:00
thorpej 3ab4598cc0 Add sysmon at cdev 101. 2002-07-29 18:26:58 +00:00
thorpej 7b652cb939 Change the way that DMA map syncs are done. Instead of remembering
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that.  This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.

Tested on an i80321 -- changes to others are mechanical.
2002-07-28 17:54:05 +00:00
briggs c13ee269dd Handle i80200 step D0 and i80321 step B0 2002-07-22 18:17:42 +00:00
ichiro 6349df15da cdev_tty_init(NIXPCOM,ixpcom) move to end of cdevsw array 2002-07-22 01:12:24 +00:00
simonb 895a23e8ae Add an "#ifndef NIXPCOM" check so that this builds on non-evbarm. 2002-07-20 00:26:51 +00:00
thorpej 3912e469dd Rename cdev_systrace_init() to cdev_clonemisc_init(), so it can
be properly used by any misc. cloning device.  While here, correct
a comment to indicate that "open" is the only entry point and that
everything else is handled with fileops.
2002-07-19 16:38:14 +00:00
ichiro 2255ed4ecb add ixpcom to cdevsw 2002-07-16 14:20:04 +00:00
ichiro 83c0b66d47 add cpu id for "PXA250/210 3rd version CPUcore".
for using many PDA/xscale-core.
2002-07-10 07:00:50 +00:00
thorpej 47506c123a Add kttcp device. 2002-06-30 23:30:07 +00:00
briggs 1b3d605b4e Remove complaint: bus_dmamap_destroy() called for map with valid
mappings bus_dma(9) states: "In the event that the DMA handle contains
a valid mapping, the mapping will be unloaded via the same mechanism
used by bus_dmamap_unload()."  And some drivers do mean to skip the
unload step.
2002-06-28 15:21:00 +00:00
thorpej 43e7ad972b Garbage-collect sigframe references. 2002-06-23 00:16:59 +00:00
christos 3b50728cf4 MD systrace gluons. 2002-06-17 16:32:57 +00:00
thorpej ffe1440f29 Add the CPU ID for the 600MHz i80321 part. 2002-06-07 18:25:28 +00:00