Commit Graph

468 Commits

Author SHA1 Message Date
scw 6b08b996ba Fix the bug reported by Richard Earnshaw in port-arm32/21349.
Make sure to check the access permissions before doing
ref/mod/domain fixups. This is particularly important
on machines with ARM_VECTORS_LOW.
2003-04-28 15:57:23 +00:00
briggs a2f6e1f09a Add arm32 machine-specific remote kgdb support. Largely
from PR port-arm/15530 by bsh@, but with some updates from
me, including a fresh arm32/kgdb_machdep.c--ported from pc532.
2003-04-28 01:54:49 +00:00
chris 70a9a33cc8 Remove a strh. I don't think it's available on archv3 and it doesn't work
on acorn32's with an SA110 in them as the bus doesn't support halfword
transfers.
2003-04-26 17:50:21 +00:00
thorpej d1c431c7e1 pmap_link_l2pt(): If not ARM32_NEW_VM_LAYOUT, add an assertion that
the VA that the page table maps is aligned to a 4MB boundary.
2003-04-22 13:49:48 +00:00
thorpej bbef46a7e9 Some ARM32_PMAP_NEW-related cleanup:
* Define a new "MMU type", ARM_MMU_SA1.  While the SA-1's MMU is basically
  compatible with the generic, the SA-1 cache does not have a write-through
  mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
  compile time.  We evaluate it like so:
  - If SA-1-style MMU is the only type configured -> 1
  - If SA-1-style MMU is not configured -> 0
  - Otherwise, defer to a run-time variable.
  If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
  check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
  code can include the necessary run-time support.  PMAP_INCLUDE_PTE_SYNC
  largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
  included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
  has a write-back cache.  If so, init the PT cache mode to C=1,B=0 to get
  write-through mode.  Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8().  Old pmap, same as generic.  New pmap,
  sets page table cacheability to 0 (ARM8 has a write-back cache, but
  flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
  C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
  since we use write-through mode all the time on ARM9 right now.  (What
  this really tells me is that the test for write-through cache is less
  than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1().  Old pmap, same as generic.  New pmap,
  does generic initialization, then resets page table cache mode to
  C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
2003-04-22 00:24:48 +00:00
thorpej 4b39c84472 Reinstate one change from rev. 1.12, but differently. Preload r2 with
0 before frobbing the control register, and use r2 in the ARMv4 TLB
flush.
2003-04-20 16:21:40 +00:00
thorpej b534f5853c Back out previous. There were several problems with the patch that
was checked in:
* It was not actually disabling the MMU, and so jumping to the
  reset vector would happily cause a panic(), since it would be
  the kernel's reset vector, not the ROM's.
* In the event the system was using high vectors, VECRELOC was not
  getting cleared, which has the potential to wreak havoc when re-entering
  the ROM.
* It was totally broken for CPUs < ARMv4; you still need to disable
  the MMU on those, just need to skip the ARMv4 TLB flush.
* The code that was checked in would only work if the kernel is mapped
  VA==PA.  For systems where the kernel is NOT mapped VA==PA, you only
  get the prefetch depth # of insns (2) after the MMU is turned off before
  you have fix the PC.

Backing out the change fixes rebooting on several evbarm platforms.
2003-04-20 15:42:51 +00:00
thorpej ec678aa9cd Use L1_S_MAPPABLE_P() and L2_L_MAPPABLE_P(). 2003-04-18 23:46:12 +00:00
thorpej 78b1b81e74 Add a comment indicating that the current method of enabling high vectors
requires that the CPU control vector be properly readable.  I believe that
all CPUs that have high vector support have a readable CPU control register,
but if we ever encounter one that does not, then we'll have to adjust this
code.
2003-04-18 22:30:05 +00:00
scw 3fe47173f5 Didn't mean to leave PMAP_DEBUG enabled ... 2003-04-18 11:55:26 +00:00
scw 41a1932e58 Add the generic arm32 bits of the new pmap, contributed by Wasabi Systems.
Some features of the new pmap are:

 - It allows L1 descriptor tables to be shared efficiently between
   multiple processes. A typical "maxusers 32" kernel, where NPROC is set
   to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
   with just 4 L1s. This completely solves the problem of running out
   of contiguous physical memory for allocating new L1s at runtime on a
   busy system.

 - Much improved cache/TLB management "smarts". This change ripples
   out to encompass the low-level context switch code, which is also
   much smarter about when to flush the cache/TLB, and when not to.

 - Faster allocation of L2 page tables and associated metadata thanks,
   in part, to the pool_cache enhancements recently contributed to
   NetBSD by Wasabi Systems.

 - Faster VM space teardown due to accurate referenced tracking of L2
   page tables.

 - Better/faster cache-alias tracking.

The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
2003-04-18 11:08:24 +00:00
scw 9c5cceb804 In arm32_vector_init(), if the vector page is ARM_VECTORS_HIGH, make
sure the CPU_CONTROL_VECRELOC bit is set in the cpu control register
before returning.
2003-04-18 10:51:35 +00:00
thorpej bcea7d5f28 Use cached physical addresses for mbufs and clusters to save having
to extract the physical address from the virtual.

On the ARM, also use the "read-only at MMU" indication to avoid a
redundant cache clean operation.

Other platforms should use these two as examples of how to use these
new pool/mbuf features to improve network performance.  Note this requires
a platform to provide a working POOL_VTOPHYS().

Part 3 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
2003-04-09 18:51:35 +00:00
thorpej 9a8042f242 Use PAGE_SIZE rather than NBPG. 2003-04-08 22:57:53 +00:00
thorpej 95281cabad Use PAGE_SIZE rather than NBPG. 2003-04-01 23:19:08 +00:00
bsh 105db01dcd for Intel PXA2[15][05] processors, select write-back/write-through
cache based on CPU id.  write-through on PXA2[15]0 B2 stepping and
earlier. write-back on C0 and C1 stepping (a.k.a PXA2[15]5 A0)

options XSCALE_CACHE_WRITE_{THROUGH,BACK} can override it.

for other XScale CPUs than PXA2xx, XSCALE_CACHE_WRITE_THROUGH works
same as before.
2003-03-29 07:58:16 +00:00
mycroft 49f94a02b4 Remove references to variables that aren't used here. 2003-03-27 19:42:30 +00:00
mycroft 0c23a8613a Fix multiple bugs in the way we do the v4 MMU disable -- it was blasting way
too many bits (including some reserved ones) and was writing the wrong value
for the TLB flush.
Also, if the flag is off, don't write the control register!
2003-03-26 17:36:56 +00:00
chris c9033077aa Garbage collect pmap_map, the last (and only?) use has been removed. 2003-03-23 15:59:23 +00:00
chris a97b660835 When doing a kernel dump use the pmap_k* funcs. Also make sure that all
data is written to ram.  This avoids issues with tlb's not being flushed
etc.

As discussed a long time ago on port-arm
2003-03-23 15:49:25 +00:00
thorpej 20c4b7b844 Change pcb32_pagedir to a paddr_t (after all, it's used as a paddr_t
everywhere in the code).
2003-02-23 23:40:01 +00:00
chris 203288830a Convert a few types into things that are more accurate, mostly:
int's to unsigned int/u_int where they shouldn't go negative.
int's to boolean_t's where they're being used as bools.

No real functional change (in the produced asm a few condition codes changed)
2003-02-21 00:23:03 +00:00
rjs ce385ae9b3 Add CPU IDs for PXA B2 and C0 steppings. 2003-02-14 16:00:33 +00:00
chris 3e2914e858 bus dma memory is allocated as M_DMAMAP so free it as M_DMAMAP, not DEVBUF. 2003-02-03 23:34:50 +00:00
wiz cd68fb44fb guarantee, not guarentee. Idea from miod@openbsd. 2003-02-02 10:24:38 +00:00
thorpej 23bc250391 Merge the nathanw_sa branch. 2003-01-17 21:55:23 +00:00
wiz 7e681f7063 interrupt with two rs. 2003-01-06 13:04:54 +00:00
wiz 5e442fbbdd specified, not specifed. 2003-01-06 12:38:47 +00:00
thorpej 074858daeb Fiddle with current_intr_depth in assembly code again. Because we
have just pushed a frame, we can make some assumptions that the
compiler cannot as easily make, and can thus do it slightly more
efficiently.
2003-01-03 00:38:16 +00:00
thorpej b33e60be39 Clean up evbarm interrupt support a little:
* Define an ARM_INTR_IMPL option, which specifies a header file
  describing the interrupt implementation for the platform.  Use
  this instead of the list of EVBARM_BOARDTYPE checks.
* Make the s3c2xx0 interrupt dispatch code a bit more generic, and move
  it to a generic location so that other platforms can use it.

This eliminates all uses of the EVBARM_BOARDTYPE stuff, so delete it.
2003-01-02 23:37:53 +00:00
reinoud 779842e0f8 Remove spurious declaration of bootconfig structure since that is already
done in bootconfig.h
2002-12-28 20:40:21 +00:00
chris 01bbc5d994 Add a debug assert that wired pages provide protection flags in the flags
argument as well.

Also update a couple of debug messages to NPDEBUG.
2002-11-24 01:09:09 +00:00
chris 3dd552c1b2 Fix's DEBUG kernel's not making it into multiuser on cats. (as spotted by
nick)
When wiring a page with pmap_enter you must supply the protection in the
flags as well as in the prot.
2002-11-24 01:07:47 +00:00
chs 4b2625143d change uvm_uarea_alloc() to indicate whether the returned uarea is already
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
2002-11-17 08:32:43 +00:00
chris 164b37a80c Tweak a few minor things:
when looking to reenable caching, only do so if all the pages aren't already
cached.
Convert some ints to unsigned int.  (scarily this actually shows the biggest
decrease in timing for my benchmark, I guess the compiler can optimise better)
2002-11-12 22:14:21 +00:00
chris e8cceb3e82 gratuitous whitespace and de-__P'ing. No functional change. 2002-11-11 20:34:03 +00:00
chris 2fc7aadded A few minor tweaks.
Use pmap_free_pvs in pmap_remove, should save on the overhead of freeing
each pv on it's own.

Correctly set ptp when calling pmap_enter_pv, this adds more overhead, but
the effect is minimal.  Timings show that it increases gmake's make configure
step from 2:07.90 to 2:08.90.  I've more optimisations planned that should
negate this increase.
2002-11-11 09:34:44 +00:00
chris cf54ec0397 Remove unused pa variable (it's assigned but not used any more) 2002-11-11 08:58:05 +00:00
jdolecek c82ab2eb79 now that mem_no is emitted by config(8), there is no reason to keep
copy of more or less identical iskmemdev() for every arch; move the function
to spec_vnop.c, and g/c machine-dependant copies
2002-10-26 13:50:17 +00:00
jdolecek e0cc03a09b merge kqueue branch into -current
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
2002-10-23 09:10:23 +00:00
bsh d5fb42a86c non-inline version of atomic_{set,clear}_bit(), defined when
ATOMIC_SET_BIT_NONINLINE_REQUIRED is defined.
(extracted from arm/arm32/locore.S)
2002-10-19 12:46:57 +00:00
bsh 7b6639153c make atomic_{set,clear}_bit() inline for arm32 ports, and
add <machine/atomic.h> for them.
2002-10-19 12:22:33 +00:00
bjh21 a531a4ae8e Undo recent cpu_switch register usage changes in order to decrease nathanw_sa
merge pain.
2002-10-19 00:10:53 +00:00
bjh21 7dd8880e90 The grand cpu_switch register reshuffle!
In particular, use r8 to hold the old process, and r7 for medium-term
scratch, saving r0-r3 for things we don't need saved over function
calls.  This gets rid of five register-to-register MOVs.
2002-10-18 23:06:33 +00:00
bjh21 3d1b6867f0 In cpu_switch(), stack more registers at the start of the function,
and hence save fewer into the PCB.  This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly.  While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.

This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
2002-10-18 21:32:57 +00:00
bsh 475d72b0be fix a bug sneaked into cpu_reset() in "- . - 8 purge"
(s/mov pc,lr/mov lr,pc/)
2002-10-15 23:10:32 +00:00
bjh21 441e8907fe Switch to using the MI C versions of setrunqueue() and remrunqueue().
GCC produces almost exactly the same instructions as the hand-assembled
versions, albeit in a different order.  It even found one place where it
could shave one off.  Its insistence on creating a stack frame might slow
things down marginally, but not, I think, enough to matter.
2002-10-15 20:53:38 +00:00
bjh21 d599df9587 Continue the " - . - 8" purge. Specifically:
add	rd, pc, #foo - . - 8		->	adr	rd, foo
ldr	rd, [pc, #foo - . - 8]		->	ldr	rd, foo

Also, when saving the return address for a function pointer call, use
"mov lr, pc" just before the call unless the return address is somewhere
other than just after the call site.

Finally, a few obvious little micro-optimisations like using LDR directly
rather than ADR followed by LDR, and loading directly into PC rather than
bouncing via R0.
2002-10-14 22:32:50 +00:00
chris a28f4c93a2 Fix arm kernel build breaks for non multiprocessor systems. 2002-10-13 21:14:28 +00:00
bjh21 3d91ec9fdd Instead of "add rd, pc, #foo - . - 8", use either "adr rd, foo" or (where
appropriate) "mov lr, pc".  This makes things slightly less confusing and
ugly.
2002-10-13 14:54:47 +00:00
bjh21 85386dce51 Use cpu_number() to find curpcb rather than assuming we're on CPU 0. 2002-10-13 14:24:09 +00:00
bjh21 75248cc7a1 It appears that MI code requires ci_cpuid to be the CPU number of the CPU
in question, whereas the ARM code was using it to hold the model
identification.  To fix this, rename:

ci_cpuid -> ci_arm_cpuid
ci_cputype -> ci_arm_cputype (for consistency)
ci_cpurev -> ci_arm_cpurev (ditto)
ci_cpunum -> ci_cpuid

This makes top(1) give correct CPU numbers in its "STATE" column (all 0 for
now).
2002-10-13 12:24:57 +00:00
bjh21 d8fd346734 Remember the location of each CPU's idle PCB in struct cpu_info.
Move allocation of the idle PCB from hydra.c to cpu.c and add some
extra initialisation from cpu_fork().
2002-10-12 21:06:46 +00:00
bjh21 a7385c575f Move curpcb into struct cpu_info in MULTIPROCESSOR kernels. 2002-10-12 12:20:08 +00:00
bjh21 6ae19cc8cd Use ADR rather than an explicit ADD from PC. 2002-10-09 22:28:03 +00:00
bjh21 67ba9f99bf Remove an outdated register assignment comment. 2002-10-08 23:48:24 +00:00
bjh21 3832819227 Minimal changes to allow a kernel with "options MULTIPROCESSOR" to compile
and boot multi-user on a single-processor machine.  Many of these changes
are wildly inappropriate for actual multi-processor operation, and correcting
this will be my next task.
2002-10-05 13:46:57 +00:00
bjh21 b828507087 constify various string tables. 2002-10-01 22:33:10 +00:00
provos 0f09ed48a5 remove trailing \n in panic(). approved perry. 2002-09-27 15:35:29 +00:00
thorpej 71404bb533 Don't include <sys/map.h>. 2002-09-25 22:21:01 +00:00
chs f01058c887 rename the existing pmap_remove_all() here to pmap_page_remove()
(ala the x86 pmap) to avoid conflicting with the new pmap interface
function of the same name.
2002-09-22 07:56:57 +00:00
nathanw 2cab03d64a In the fault handler, record growth of the stack, so that core dumps
actually contain the entire stack.
2002-09-21 00:29:04 +00:00
gehenna 77a6b82b27 Merge the gehenna-devsw branch into the trunk.
This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

	device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
  by using this grammer.

- Added the new naming convention.
  The name of the device switch must be <prefix>_[bc]devsw for auto-generation
  of device switch tables.

- The backward compatibility of loading block/character device
  switch by LKM framework is broken. This is necessary to convert
  from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
  We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
  the LKM framework will refer it to assign device major number dynamically.
2002-09-06 13:18:43 +00:00
jdolecek 8839507f5b whitespace fix past __KERNEL_RCSID() 2002-09-05 18:34:00 +00:00
thorpej 212cb9f78d Add machine-dependent bits of RAS for arm32. 2002-08-31 03:07:32 +00:00
thorpej 139cdc3125 Make nbuf, nswbuf, and bufpages unsigned. Make all operations on these
variables unsigned, and update places where their values are printed.
2002-08-25 20:21:33 +00:00
thorpej ffdedb6d80 In pmap_map_in_l1() and pmap_unmap_in_l1(), make sure that the VA
that is passed in is already aligned to a 4M super-section.
2002-08-24 03:10:40 +00:00
thorpej d158b3a37a When we allocate a PTP, make sure the offset we specify is for
the 4M super-section that the PTP will map, not some random 1M
chunk of it.  This gives the PTP hint code a much better chance
to working properly, and allows us to tidy up the code that
flushes a PTP from the cache in pmap_destroy().
2002-08-24 02:50:53 +00:00
thorpej 77a6866508 Enable caching on kernel and user page tables. This saves having
to do uncached memory access during VM operations (which can be
quite expensive on some CPUs).

We currently write-back PTEs as soon as they're modified; there is
some room for optimization (to write them back in larger chunks).
For PTEs in the APTE space (i.e. PTEs for pmaps that describe another
process's address space), PTEs must also be evicted from the cache
complete (PTEs in PTE space will be evicted durint a context switch).
2002-08-24 02:16:30 +00:00
thorpej 6cc7c1c1ff * Add PTE_SYNC() and PTE_SYNC_RANGE() macros. These don't actually do
anything yet.
* Use PTE_SYNC() and PTE_SYNC_RANGE() in some obvious places, i.e.
  where vtopte() is used.
2002-08-22 01:13:53 +00:00
thorpej 574a9cc019 Use a pool cache for PT-PTs. 2002-08-21 21:22:52 +00:00
thorpej 5fddbbe3d5 Do cached memory access to L1 tables, making sure to write-back the
cache after any L1 table modifications.
2002-08-21 18:34:31 +00:00
thorpej 003b8e8bca More local label fixups. 2002-08-17 16:36:31 +00:00
briggs 20267a208f Do not trim 'offset' from 'len' in _bus_dmamap_sync_linear(). 2002-08-17 05:14:10 +00:00
briggs d86c947b8c Inline bus_dma_inrange() and bus_dmamap_sync_*(). 2002-08-17 01:15:15 +00:00
thorpej 50fe583069 Must ... micro ... optimize!
* Save an instruction in the transition from idle to have-process-to-
  switch-to, and eliminate two instructions that cause datadep-stalls
  on StrongARM And XScale (one in each idle block).
* Rearrange some other instructions to avoid datadep-stalls on StrongARM
  and XScale.
* Since cpu_do_powersave == 0 is by far the common case, avoid a
  pipeline flush by reordering the two idle blocks.
2002-08-17 01:08:21 +00:00
thorpej ebff575bc3 * Add a new machdep.powersave sysctl, which controls the use of
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
  (SA11x0 and PXA2x0).

This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
2002-08-16 15:25:53 +00:00
briggs fa81e3d75e * Use local label names (.Lfoo vs. (Lfoo or foo))
* When moving from cpsr, use "cpsr" instead of "cpsr_all" (which is
   provided, but doesn't make sense since mrs doesn't support fields
   like msr does).
2002-08-15 01:37:01 +00:00
thorpej ad73349331 We only need to modify the CPSR's control field, so use cpsr_c rather
than cpsr_all.
2002-08-14 23:23:06 +00:00
chris f4c605201d Tweak asm to avoid a couple of stalls. 2002-08-14 23:07:36 +00:00
thorpej b45159bad0 When doing PREREAD sync operations, if the start and end addresses
of the range are aligned to a cacheline boundary, when do a dcache-inv
operation, rather than a dcache-wbinv operation.

XXX It could be a little smarter (align using wbinv, inv, then finish
up using wbinv), but even this simple change is good for a nearly 40%
improvement in my test case on XScale.
2002-08-14 22:56:55 +00:00
briggs a957deca48 G/c cowfault. 2002-08-14 21:52:36 +00:00
thorpej 203dd6b325 * Add an ARM32_DMAMAP_COHERENT flag to indicate that a loaded DMA
map contains "coherent" (non-cached in ARM-land) mappings.
* Set ARM32_DMAMAP_COHERENT in the map at the start of a load operation,
  and clear it in _bus_dmamap_load_buffer() if we encounter any cacheable
  mappings.
* In _bus_dmamap_sync(), if the map is marked COHERENT, skip any cache
  flushing.
2002-08-14 20:50:37 +00:00
thorpej d00a4a068d Whe making a mapping "coherent", clear *ALL* the cache bits, not
just L2_B and L2_C.
2002-08-14 19:21:50 +00:00
thorpej 98d6ec0b89 Add the brutal hack that allows us to limp along using the read/write
cache line allocation policy on XScale CPUs: in pmap_enter(), if the
pmap is the kernel pmap, clear the X-bit in the PTE, thus disabling
read/write-allocate for managed kernel mappings.

Yes, this is ugly.  But it makes userland code run with r/w-allocate,
which is a huge improvement on systems with low core memory performance.
2002-08-13 03:36:30 +00:00
thorpej d7be866fc8 Rearrange the beginning of cpu_switch() slightly to reduce data-dep
stalls on StrongARM and XScale.
2002-08-12 21:00:12 +00:00
bjh21 664bea62e3 __KERNEL_RCSID 2002-08-12 20:19:04 +00:00
bjh21 ca86069053 When pcb_onfault is set, pass the error code we get from uvm_fault()
(or EFAULT if we never called uvm_fault) to the onfault handler in R0,
in case it wants to use it.
2002-08-12 20:17:37 +00:00
thorpej 3d6f9f69ab Make a slight tweak to register usage to save an instruction. 2002-08-12 19:33:01 +00:00
bjh21 657216ff0f Remove a file which was accidentally resurrected. 2002-08-11 23:20:11 +00:00
bjh21 206c97ccc2 Move the arm32 copystr.S from arch/arm/arm32 to arch/arm/arm and add support
for 26-bit modes (basically saving R14 when we might get a page fault).
Use it on all ARM architectures now.
2002-08-11 23:17:24 +00:00
bjh21 b6228a7d06 New, improved version of copyin(), copyout(), and kcopy() by Allen Briggs.
This version works on both 26-bit and 32-bit machines.  For large copies,
it's up to three times as fast as the old arm32 version and five times as
fast as the old arm26 version.  For small copies it seems to be even faster
(getrusage() is apparently over ten times faster on an ARM610).

Hooray for Allen!
2002-08-11 21:19:12 +00:00
thorpej 76730bd0cc Tidy up pmap_clean_page() a little, and reenable some code that was
disabled previously: Skip cleaning mappings which are read-only, because
the pmap (now) does clean pages on a r/w -> r/o transition.
2002-08-10 00:48:35 +00:00
thorpej 006a578742 Clean up some warts in pmap_protect(). 2002-08-10 00:11:51 +00:00
thorpej 15a5e8f238 cpu_fork(): If PMCs are not enabled in the parent, clear the machine-
dependent PMC state in the child.
2002-08-09 23:44:17 +00:00
thorpej 6ce0a206cc Add an XSCALE_CACHE_READ_WRITE_ALLOCATE option for people who
want to play fast-and-loose.
2002-08-09 21:49:09 +00:00
thorpej 884bc64586 Add some code, conditional on PMAP_ALIAS_DEBUG, that can be used to
hunt for virtual aliases between managed (pmap_enter) and non-managed
(pmap_kenter_pa) mappings.
2002-08-09 18:22:59 +00:00
thorpej c979315325 Reduce stalls on StrongARM and XScale by waiting one insn before using
the result of a load.
2002-08-09 06:18:24 +00:00
thorpej afe3274eed Use ldrbt/strbt. Some other random cleanup. 2002-08-09 06:03:02 +00:00
thorpej 410785d6f0 Use ldrt/strt. 2002-08-09 04:13:20 +00:00
thorpej fdcc8560e4 Speed up bcopy_page() on the XScale slightly by using the "pld"
insn (prefetch) to look-ahead to the next chunk while we copy the
current chunk.

This could probably use a bit more tuning.
2002-08-07 16:21:29 +00:00
briggs 0b956d0b8b Implement pmc(9) -- An interface to hardware performance monitoring
counters.  These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface.  Initially, the Intel XScale
counters are the only ones supported.
2002-08-07 05:14:47 +00:00
thorpej 26bc8b27f4 - pmap_remove(): unmap the PTEs *after* we have finished with the
page tables.
- pmap_enter(): if making a mapping for the same PA rw->ro, write-back
  the cache before doing so.
- pmap_clearbit(): if revoking REF on a page, make sure to wbinv the
  cache if the page has write permission, else inv the cache if the page's
  PTE is valid (XXX we actually wbinv in this case, as well, due to lack
  of idcache_inv_range()).  Only flush the TLB if the PTE changed.
2002-08-06 21:43:51 +00:00
thorpej 0886c8cc0f Rearrange the exit path so that we don't do a idcache_wbinv_all *twice*
when a process exits.
2002-08-06 19:20:29 +00:00
thorpej 62d83d05b1 * Pass proc0 to switch_exit(), to make this a little more like the
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
  proc0) so that we actually use the "exiting process" optimization in
  cpu_switch().
2002-08-06 17:44:35 +00:00
thorpej f7328ddbe7 Add dmoverio. 2002-08-02 00:50:25 +00:00
thorpej dce4476374 Overhaul how DMA ranges work in the ARM bus_dma implementation.
A new "arm32_dma_range" structure now describes a DMA window, with
a system address base, bus address base, and length.  In addition to
providing info about which memory regions are legal for DMA, the new
structure provides address translation support, as well.

As before, if a tag does not list any ranges, then all addresses are
considered valid, and no DMA address translation is performed.

This allows us to remove a large chunk of code which was duplicated and
tweaked slightly (to do the address translation) from the stock ARM
bus_dma in the XScale IOP and ARM Integrator ports.

Test compiled on all ARM platforms, test booted on Intel IQ80321 and Shark.
2002-07-31 17:34:23 +00:00
thorpej 79af00bddb Move the calls to uvm_page_physload() out of pmap_bootstrap() and
into platform-specific initialization code, giving platform-specific
code control over which free list a given chunk of memory gets put
onto.

Changes are essentially mechanical.  Test compiled for all ARM
platforms, test booted on Intel IQ80321 and Shark.

Discussed some time ago on port-arm.
2002-07-31 00:20:51 +00:00
thorpej d3aa5664b7 Move the uvm_setpagesize() call to platform-dependent code in preparation
for other changes to pmap_bootstrap().
2002-07-30 16:16:38 +00:00
thorpej 3dcad9ac9e Don't use pmap_kenter_pa() in pmap_map(); doing so causes an assertion
failure in pmap_kenter_pa().
2002-07-30 16:07:23 +00:00
thorpej 3ab4598cc0 Add sysmon at cdev 101. 2002-07-29 18:26:58 +00:00
thorpej 7b652cb939 Change the way that DMA map syncs are done. Instead of remembering
the virtual address for each DMA segment, just cache a pointer to the
original buffer/buftype used to load the DMA map, and use that.  This
lets us shrink the bus_dma_segment_t down from 12 bytes to 8, and the
cache flushing is also more efficient.

Tested on an i80321 -- changes to others are mechanical.
2002-07-28 17:54:05 +00:00
briggs c13ee269dd Handle i80200 step D0 and i80321 step B0 2002-07-22 18:17:42 +00:00
ichiro 6349df15da cdev_tty_init(NIXPCOM,ixpcom) move to end of cdevsw array 2002-07-22 01:12:24 +00:00
simonb 895a23e8ae Add an "#ifndef NIXPCOM" check so that this builds on non-evbarm. 2002-07-20 00:26:51 +00:00
thorpej 3912e469dd Rename cdev_systrace_init() to cdev_clonemisc_init(), so it can
be properly used by any misc. cloning device.  While here, correct
a comment to indicate that "open" is the only entry point and that
everything else is handled with fileops.
2002-07-19 16:38:14 +00:00
ichiro 2255ed4ecb add ixpcom to cdevsw 2002-07-16 14:20:04 +00:00
ichiro 83c0b66d47 add cpu id for "PXA250/210 3rd version CPUcore".
for using many PDA/xscale-core.
2002-07-10 07:00:50 +00:00
thorpej 47506c123a Add kttcp device. 2002-06-30 23:30:07 +00:00
briggs 1b3d605b4e Remove complaint: bus_dmamap_destroy() called for map with valid
mappings bus_dma(9) states: "In the event that the DMA handle contains
a valid mapping, the mapping will be unloaded via the same mechanism
used by bus_dmamap_unload()."  And some drivers do mean to skip the
unload step.
2002-06-28 15:21:00 +00:00
thorpej 43e7ad972b Garbage-collect sigframe references. 2002-06-23 00:16:59 +00:00
christos 3b50728cf4 MD systrace gluons. 2002-06-17 16:32:57 +00:00
thorpej ffe1440f29 Add the CPU ID for the 600MHz i80321 part. 2002-06-07 18:25:28 +00:00
drochner d2b9876081 move initialization of the "struct pglist" returned by uvm_pglistalloc()
from the calling code into uvm_pglistalloc() itself for consistency
and easier error handling
2002-06-02 14:44:35 +00:00
lukem 06de426449 SIMPLEQ rototill:
- implement SIMPLEQ_REMOVE(head, elm, type, field).  whilst it's O(n),
  this mirrors the functionality of SLIST_REMOVE() (the other
  singly-linked list type) and FreeBSD's STAILQ_REMOVE()
- remove the unnecessary elm arg from SIMPLEQ_REMOVE_HEAD().
  this mirrors the functionality of SLIST_REMOVE_HEAD() (the other
  singly-linked list type) and FreeBSD's STAILQ_REMOVE_HEAD()
- remove notes about SIMPLEQ not supporting arbitrary element removal
- use SIMPLEQ_FOREACH() instead of home-grown for loops
- use SIMPLEQ_EMPTY() appropriately
- use SIMPLEQ_*() instead of accessing sqh_first,sqh_last,sqe_next directly
- reorder manual page; be consistent about how the types are listed
- other minor cleanups
2002-06-01 23:50:52 +00:00
ichiro 4c034ead9b make compile when define DEBUG 2002-05-25 07:58:35 +00:00
chris a9e806ee0c Implement scheduler lock protocol, this fixes PR arm/10863.
Also add correct locking when freeing pages in pmap_destroy (fix from potr)

This now means that arm32 kernels can be built with LOCKDEBUG enabled. (only tested on cats though)
2002-05-14 19:22:34 +00:00
matt 0a6d35b7ed Nuke local extern label_t *db_recover; it's now in <ddb/db_extern.h> 2002-05-13 20:30:07 +00:00
ichiro be557a5f28 change ICP12x0 steppings.
define CPU_IXP12X0
2002-05-12 15:05:41 +00:00
thorpej 22cea0e73c Add IXP1200 steppings. 2002-05-10 17:50:25 +00:00
jdolecek f2f12a240b Update to md(4) changes: memory_disk_size is now md_root_size, and
type is size_t
2002-05-05 16:26:30 +00:00
thorpej 860fe83065 Add support for the Intel PXA210 and PXA250. From Hiroyuki Bessho, PR 16617. 2002-05-03 03:28:48 +00:00
rjs 9646735a82 Enable CPU_CLASS_SA1 for SA1100 and SA1110. 2002-05-02 22:57:36 +00:00
thorpej 8bd36dc909 Make a comment describe what the code actually does. 2002-04-25 23:23:23 +00:00
thorpej 2c0a144aa4 * pmap_clean_page(): Clean up a comment.
* pmap_protect(): write back the range when doing a r/w -> r/o
  transition.  (Still leave the block concerned with this in
  pmap_clean_page() disabled, for now.)
* pmap_pte_init_xscale(): Disable read/write-allocate for now, until
  we figure out why sometimes cache lines of NULs get deposited into
  file data.  Also, make sure ECC protection of page table access is
  disabled for now.
* xscale_setup_minidata(): Make sure the mini-data cache is configured
  write-back with read/write-allocate.
2002-04-24 17:35:10 +00:00
wiz d79f4782b6 Complete renaming of opms to opms (was partly named pms, externally and
internally).  Move arm/iomd/pms* to arm/iomd/opms*. Mechanical change,
tested by cross-compiling a kernel from i386.

Approved by christos.

XXX: What are arm/arm32/conf.c and arm/include/conf.h good for?
2002-04-19 01:04:38 +00:00
thorpej 10c0c20ad4 Default all XScale core processors to the read/write-allocate write-back
cache mode.  Add a new XSCALE_CACHE_WRITE_THROUGH option for people who
are paranoid about the cache-related errata (you *do* have to line up
the planets correctly to trip them, but having the option is useful).
2002-04-12 21:52:45 +00:00
thorpej 32a0860797 Centralize ARM CPU configuration information by adding a new header
file, <arm/cpuconf.h>, which pulls in "opt_cputypes.h" and then defines
the following:
* CPU_NTYPES -- now many CPU types are configured into the kernel.  What
  you really want to know is "== 1" or "> 1".
* Defines ARM_ARCH_2, ARM_ARCH_3, ARM_ARCH_4, ARM_ARCH_5, depending
  on which ARM architecture versions are configured (based on CPU_*
  options).  Also defines ARM_NARCH to determins how many architecture
  versions are configured.
* Defines ARM_MMU_MEMC, ARM_MMU_GENERIC, ARM_MMU_XSCALE depending on
  which classes of ARM MMUs are configured into the kernel, and ARM_NMMUS
  to determine how many MMU classes are configured.

Remove the needless inclusion of "opt_cputypes.h" in several places.
Convert remaining users to <arm/cpuconf.h>.
2002-04-12 18:50:29 +00:00
thorpej 27d98ca694 Remove the Control register handling from arm32_vector_init(). Apparently,
the ARM6 and ARM7 do completely the wrong thing if you read this register,
so we have to handle this a different way.
2002-04-10 21:45:43 +00:00
thorpej 59c9e94b72 vm_offset_t -> vaddr_t,paddr_t 2002-04-10 19:35:22 +00:00
thorpej ad2350dccf On XScale processors where we use write-back caching, use are
read/write-allocate line allocation policy.

On the i80321, this improves nearly every lmbench benchmark, dramatically
so the ones that are sensitive to memory bandwidth (100-300% improvement
for these).
2002-04-10 17:39:31 +00:00
thorpej 2b924304ab Add a new function, pmap_alloc_ptpt(), that allocates the PTPT and
maps it the way we want, rather than using uvm_km_zalloc() and playing
the "revoke cacheability" song-and-dance.
2002-04-10 17:08:13 +00:00
thorpej cad393fa1c pmap_alloc_l1pt(): Just enter the mappings for the L1 table by
hand, rather than calling pmap_kenter_pa() and then revoking
cacheability in the PTE.
2002-04-10 15:56:21 +00:00
thorpej cd0e28f1e7 Use L2_S_CACHE_MASK in places where we revoke cacheability. 2002-04-10 15:44:23 +00:00
thorpej 668547d841 pmap_kenter_pa(): Obey the "prot" argument, rather than simply making
all mappings r/w (!!).
2002-04-10 04:40:58 +00:00
thorpej 6e52cbf89e In pmap_copy_page_xscale(), put the source page in the mini-data
cache, as well.  The mini-data cache is 2-way, so src and dst won't
clobber each other, and the smallness of the cache doesn't matter,
since we access each page once sequentially.

While we still have to do the initial clean of the source page, this
saves another 4K of main D$ pollution, and also means we don't have
to do 2 cache passes after the copy is complete (i.e. we can skip the
invalidation of the source page in the main cache, since it's no longer
there).
2002-04-10 01:30:42 +00:00
thorpej 2092e78cec Add separate pmap_{zero,copy}_page() functions for generic ARM
vs. XScale.  Use the mini-data cache for the destination on XScale,
thus saving tossing out 4K of possible-useful data from the main
data cache each time.

This significantly improves every test in lmbench.
2002-04-10 00:45:43 +00:00
thorpej da162bee90 * Move the code that cleans the XScale mini-data cache into its
own function.
* Add a new function which sets up the mini-data cache clean area
  properly.
2002-04-09 23:44:00 +00:00
thorpej 1b20a04772 * Split pte_cache_mode into pte_l1_s_cache_mode, pte_l2_l_cache_mode,
and pte_l2_s_cache_mode.  The cache-meaningful bits are different
  for these descriptor types on some processor models.
* Add pte_*_cache_mask, corresponding to each above, which has a mask
  of the cache-meangful bits, and define those for generic and XScale
  MMU classes.  Note, the L2_S_CACHE_MASK_xscale definition requires
  use of the Extended Small Page L2 descriptor (the "X" bit overlaps
  with AP bits otherwise).
2002-04-09 22:37:00 +00:00
thorpej c535f4ffc4 Define 2 classes of ARM MMUs:
1. Generic (compatible with ARM6)
1. XScale (can be used as generic, but also has certainly nifty extensions).

Define abstract PTE bit defintions for each MMU class.  If only one MMU
class is configured into the kernel (based on CPU_* options), then we
get the constants for that MMU class.  Otherwise we indirect through
varaibles set up via set_cpufuncs().

XXX The XScale bits are currently the same as the generic bits.  Baby steps.
2002-04-09 21:00:42 +00:00