Commit Graph

38 Commits

Author SHA1 Message Date
scw 52c15bbd20 Don't drop to spl0 in cpu_switch/cpu_switchto. Do it in the idle loop
instead.

With this change, we no longer need to save the current interrupt level
in the switchframe. This is no great loss since both cpu_switch and
cpu_switchto are always called at splsched, so the process' spl is
effectively saved somewhere in the callstack.

This fixes an evbarm problem reported by Allen Briggs:

        lwp gets into sa_switch -> mi_switch with newl != NULL
            when it's the last element on the runqueue, so it
            hits the second bit of:
                if (newl == NULL) {
                        retval = cpu_switch(l, NULL);
                } else {
                        remrunqueue(newl);
                        cpu_switchto(l, newl);
                        retval = 0;
                }

        mi_switch calls remrunqueue() and cpu_switchto()

        cpu_switchto unlocks the sched lock
        cpu_switchto drops CPU priority
        softclock is received
        schedcpu is called from softclock
        schedcpu hits the first if () {} block here:
                if (l->l_priority >= PUSER) {
                        if (l->l_stat == LSRUN &&
                            (l->l_flag & L_INMEM) &&
                            (l->l_priority / PPQ) != (l->l_usrpri / PPQ)) {
                                remrunqueue(l);
                                l->l_priority = l->l_usrpri;
                                setrunqueue(l);
                        } else
                                l->l_priority = l->l_usrpri;
                }

        Since mi_switch has already run remrunqueue, the LWP has been
            removed, but it's not been put back on any queue, so the
            remrunqueue panics.
2003-10-23 08:59:10 +00:00
scw 63d24b09fd A couple of Xscale tweaks:
- Use the "clz" instruction to pick a run-queue, instead of using the
   ffs-by-table-lookup method.
 - Use strd instead of stmia where possible.
 - Use multiple ldr instructions instead of ldmia where possible.
2003-10-13 21:44:27 +00:00
martin d505b18964 Make sure to include opt_foo.h if a defflag option FOO is used. 2003-06-23 11:00:59 +00:00
chris 93632a0574 Fix for port-arm/21962. Rather than fixing the #ifndef spl0, I removed
the test as spl0 is actually a macro for splx(0). The code now calls
splx(0)

(note building with the #ifdef fixed, caused the build to fail on a
GENERIC acorn32 kernel.)
2003-06-23 09:05:22 +00:00
kristerw 28f5335a9f Fix LINTSTUB comments. 2003-05-31 01:40:05 +00:00
thorpej c8bed530ac Remove #ifdefs supporting the old pmap, switching fully to the new. 2003-05-21 18:04:42 +00:00
chris 70a9a33cc8 Remove a strh. I don't think it's available on archv3 and it doesn't work
on acorn32's with an SA110 in them as the bus doesn't support halfword
transfers.
2003-04-26 17:50:21 +00:00
thorpej bbef46a7e9 Some ARM32_PMAP_NEW-related cleanup:
* Define a new "MMU type", ARM_MMU_SA1.  While the SA-1's MMU is basically
  compatible with the generic, the SA-1 cache does not have a write-through
  mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
  compile time.  We evaluate it like so:
  - If SA-1-style MMU is the only type configured -> 1
  - If SA-1-style MMU is not configured -> 0
  - Otherwise, defer to a run-time variable.
  If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
  check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
  code can include the necessary run-time support.  PMAP_INCLUDE_PTE_SYNC
  largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
  included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
  has a write-back cache.  If so, init the PT cache mode to C=1,B=0 to get
  write-through mode.  Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8().  Old pmap, same as generic.  New pmap,
  sets page table cacheability to 0 (ARM8 has a write-back cache, but
  flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
  C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
  since we use write-through mode all the time on ARM9 right now.  (What
  this really tells me is that the test for write-through cache is less
  than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1().  Old pmap, same as generic.  New pmap,
  does generic initialization, then resets page table cache mode to
  C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
2003-04-22 00:24:48 +00:00
scw 41a1932e58 Add the generic arm32 bits of the new pmap, contributed by Wasabi Systems.
Some features of the new pmap are:

 - It allows L1 descriptor tables to be shared efficiently between
   multiple processes. A typical "maxusers 32" kernel, where NPROC is set
   to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
   with just 4 L1s. This completely solves the problem of running out
   of contiguous physical memory for allocating new L1s at runtime on a
   busy system.

 - Much improved cache/TLB management "smarts". This change ripples
   out to encompass the low-level context switch code, which is also
   much smarter about when to flush the cache/TLB, and when not to.

 - Faster allocation of L2 page tables and associated metadata thanks,
   in part, to the pool_cache enhancements recently contributed to
   NetBSD by Wasabi Systems.

 - Faster VM space teardown due to accurate referenced tracking of L2
   page tables.

 - Better/faster cache-alias tracking.

The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
2003-04-18 11:08:24 +00:00
thorpej 23bc250391 Merge the nathanw_sa branch. 2003-01-17 21:55:23 +00:00
bjh21 a531a4ae8e Undo recent cpu_switch register usage changes in order to decrease nathanw_sa
merge pain.
2002-10-19 00:10:53 +00:00
bjh21 7dd8880e90 The grand cpu_switch register reshuffle!
In particular, use r8 to hold the old process, and r7 for medium-term
scratch, saving r0-r3 for things we don't need saved over function
calls.  This gets rid of five register-to-register MOVs.
2002-10-18 23:06:33 +00:00
bjh21 3d1b6867f0 In cpu_switch(), stack more registers at the start of the function,
and hence save fewer into the PCB.  This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly.  While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.

This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
2002-10-18 21:32:57 +00:00
bjh21 441e8907fe Switch to using the MI C versions of setrunqueue() and remrunqueue().
GCC produces almost exactly the same instructions as the hand-assembled
versions, albeit in a different order.  It even found one place where it
could shave one off.  Its insistence on creating a stack frame might slow
things down marginally, but not, I think, enough to matter.
2002-10-15 20:53:38 +00:00
bjh21 d599df9587 Continue the " - . - 8" purge. Specifically:
add	rd, pc, #foo - . - 8		->	adr	rd, foo
ldr	rd, [pc, #foo - . - 8]		->	ldr	rd, foo

Also, when saving the return address for a function pointer call, use
"mov lr, pc" just before the call unless the return address is somewhere
other than just after the call site.

Finally, a few obvious little micro-optimisations like using LDR directly
rather than ADR followed by LDR, and loading directly into PC rather than
bouncing via R0.
2002-10-14 22:32:50 +00:00
bjh21 3d91ec9fdd Instead of "add rd, pc, #foo - . - 8", use either "adr rd, foo" or (where
appropriate) "mov lr, pc".  This makes things slightly less confusing and
ugly.
2002-10-13 14:54:47 +00:00
bjh21 a7385c575f Move curpcb into struct cpu_info in MULTIPROCESSOR kernels. 2002-10-12 12:20:08 +00:00
bjh21 6ae19cc8cd Use ADR rather than an explicit ADD from PC. 2002-10-09 22:28:03 +00:00
bjh21 67ba9f99bf Remove an outdated register assignment comment. 2002-10-08 23:48:24 +00:00
bjh21 3832819227 Minimal changes to allow a kernel with "options MULTIPROCESSOR" to compile
and boot multi-user on a single-processor machine.  Many of these changes
are wildly inappropriate for actual multi-processor operation, and correcting
this will be my next task.
2002-10-05 13:46:57 +00:00
thorpej 212cb9f78d Add machine-dependent bits of RAS for arm32. 2002-08-31 03:07:32 +00:00
thorpej 003b8e8bca More local label fixups. 2002-08-17 16:36:31 +00:00
thorpej 50fe583069 Must ... micro ... optimize!
* Save an instruction in the transition from idle to have-process-to-
  switch-to, and eliminate two instructions that cause datadep-stalls
  on StrongARM And XScale (one in each idle block).
* Rearrange some other instructions to avoid datadep-stalls on StrongARM
  and XScale.
* Since cpu_do_powersave == 0 is by far the common case, avoid a
  pipeline flush by reordering the two idle blocks.
2002-08-17 01:08:21 +00:00
thorpej ebff575bc3 * Add a new machdep.powersave sysctl, which controls the use of
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
  (SA11x0 and PXA2x0).

This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
2002-08-16 15:25:53 +00:00
briggs fa81e3d75e * Use local label names (.Lfoo vs. (Lfoo or foo))
* When moving from cpsr, use "cpsr" instead of "cpsr_all" (which is
   provided, but doesn't make sense since mrs doesn't support fields
   like msr does).
2002-08-15 01:37:01 +00:00
thorpej ad73349331 We only need to modify the CPSR's control field, so use cpsr_c rather
than cpsr_all.
2002-08-14 23:23:06 +00:00
chris f4c605201d Tweak asm to avoid a couple of stalls. 2002-08-14 23:07:36 +00:00
thorpej d7be866fc8 Rearrange the beginning of cpu_switch() slightly to reduce data-dep
stalls on StrongARM and XScale.
2002-08-12 21:00:12 +00:00
thorpej 3d6f9f69ab Make a slight tweak to register usage to save an instruction. 2002-08-12 19:33:01 +00:00
thorpej 0886c8cc0f Rearrange the exit path so that we don't do a idcache_wbinv_all *twice*
when a process exits.
2002-08-06 19:20:29 +00:00
thorpej 62d83d05b1 * Pass proc0 to switch_exit(), to make this a little more like the
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
  proc0) so that we actually use the "exiting process" optimization in
  cpu_switch().
2002-08-06 17:44:35 +00:00
chris a9e806ee0c Implement scheduler lock protocol, this fixes PR arm/10863.
Also add correct locking when freeing pages in pmap_destroy (fix from potr)

This now means that arm32 kernels can be built with LOCKDEBUG enabled. (only tested on cats though)
2002-05-14 19:22:34 +00:00
thorpej 4e990d9ccb Overhaul of the ARM cache code. This is mostly a simplification
pass.  Rather than providing a whole slew of cache operations that
aren't ever used, distill them down to some useful primitives:

	icache_sync_all         Synchronize I-cache
	icache_sync_range       Synchronize I-cache range

	dcache_wbinv_all        Write-back and Invalidate D-cache
	dcache_wbinv_range      Write-back and Invalidate D-cache range
	dcache_inv_range        Invalidate D-cache range
	dcache_wb_range         Write-back D-cache range

	idcache_wbinv_all       Write-back and Invalidate D-cache,
				Invalidate I-cache
	idcache_wbinv_range     Write-back and Invalidate D-cache,
				Invalidate I-cache range

Note: This does not yet include an overhaul of the actual asm files
that implement the primitives.  Instead, we've provided a safe default
for each CPU type, and the individual CPU types can now be optimized
one at a time.
2002-01-25 19:19:22 +00:00
thorpej a2c8fc94fe Provide a way for platforms to move away from the old RiscPC-centric
interrupt code.  Garbage-collect some unused stuff.
2001-11-29 17:14:02 +00:00
chris 165b023373 Give the idle loop a non-profiled entry, means it appears in profile info correctly (rather than all it's time being under remrunqueue)
switch_exit only needs to take 1 parameter, it loads the value of proc0 into R1 itself
Fixup some comments to reflect the real state of things.
Tweak a couple of bits of asm to avoid a load delay.
remove excess code for setting curpcb and curproc.
2001-11-19 20:38:58 +00:00
chris 8298c55eab Correct comments for ffs algoritm (it isn't using register r0) 2001-11-11 22:07:41 +00:00
matt d75fe4fc1e Fix .type which uses wrong symbol name. 2001-09-16 17:38:08 +00:00
chris 27f96e8440 Move the generic arm32 files into arm/arm32 from arm32/arm32, tested kernel builds on cats and riscpc. 2001-07-28 13:28:03 +00:00