- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP
Just before removing the vector page mapping, switch to the kernel
pmap's L1/vector page mapping so as not to pull the rug out from
under ourselves.
To prevent the stale L1/vector page mapping from being restored by
cpu_switch, replace the relevant fields of the dying process' pcb
with those of lwp0's pcb.
process context ('reaper').
From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit
uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.
MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.
g/c now unneeded routines and variables, including the reaper kernel thread
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.
This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.
On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
the idle loop. They seem to have gone AWOL sometime in the past.
Fixes port-arm/23390.
- While here, tidy up the idle loop.
- Add a cheap DIAGNOSTIC check for run queue sanity.
returns non-zero and we want to shortcut out. This avoids a bogus pagefault
condition being detected in sa_switch().
Many thanks to Christian Limpach for finding this, obviating my band-aid
patch to kern_sa.c (posted on tech-kern).
the call to data_abort_fixup() as the fixup routines also try to
de-reference the fault pc.
- If a fault came from kernel mode, and the fault address looks to be in
the kernel's address space, and pcb_onfault is *set*, check the
instruction which caused the fault. If it's LDR{B,}T or STR{B,}T
then one of the copy in/out routines is trying to read/write a
kernel address with the wrong privilege. If that address is actually
mapped, we could end up in an infinite loop because we failed to
notice that it's really a 'user mode' access. Yay for "crashme".
I suspect this also fixes PR port-arm/23052.
Note: This *could* be fixed by adding sanity checks to copyin et al,
but that would add extra overhead to the non-error path...
- Fix a couple of __predict_false cases.
to determine if a fault is read or write, make sure tf->tf_pc is 32-bit
aligned before dereferencing it.
Otherwise, deliver an illegal instruction signal to the process. We don't
support execution of Thumb code at this time.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
Simplify window test when adding a ras and correct test on VM_MAXUSER_ADDRESS.
Avoid unpredictable branch in i386 locore.S
(pad fields left in struct proc to avoid kernel bump)
- Assume a permission fault is always the result of an attempted
write, so no need to disassemble the opcode.
(as discussed with Richard Earnshaw/Jason Thorpe a week or two ago)
- Split out non-MMU data aborts into separate functions, and deal
correctly with XScale imprecise aborts. Specifically, the old code
made no attempt to handle the double abort faults which can occur
as a result of two consecutive external (imprecise) aborts. This
was easy to provoke by read(2)ing from a /dev/mem offset which caused
an external abort. With the old code, this would bring the system
down instantly, with little clue as to why. (hint: tf_spsr held
PSR_ABT32_MODE...)
- Re-write badaddr_read() to use pcb_onfault instead of adding extra
overhead to data_abort_handler(). A side effect of this is that it
now benefits from the XScale double abort recovery.
- Invoke the cpu-specific prefetch/data abort fixup routines only if
the host cpu actually needs it. On other cpus, the code is optimised
away.
- Sprinkle __predict_{false,true} in all the right places.
- G/C some excess debugging baggage.
needless duplication.
Additionally, merge AST handling into the same code.
exception.S and the generic irq_dispatch.S routines have been updated
to use the macroes.
XXX: I have patches for the non-generic IRQ dispatch routines, but they
need testing by someone with hardware.
flushed on every context switch as an indicator that a mapping is
not resident in the cache.
Instead, used the per-pmap flag maintained by the cpu_switch/pmap code.
Enable alignment faults on arm32 for both kernel and userland.
If COMPAT_15 and EXEC_AOUT are defined, support per-process
alignment checking where AFLTs are always enabled when running
kernel code and userland ELF binaries, and dynamically disabled/
enabled when switching to/from a.out binaries. This is necessary
in order to execute older a.out binaries, where gcc made
deliberate use of misaligned loads under certain circumstances.
If COMPAT_15 and EXEC_AOUT are defined, support per-process
alignment checking where AFLTs are always enabled when running
kernel code and userland ELF binaries, and dynamically disabled/
enabled when switching to/from a.out binaries. This is necessary
in order to execute older a.out binaries, where gcc made
deliberate use of misaligned loads under certain circumstances.
instead.
With this change, we no longer need to save the current interrupt level
in the switchframe. This is no great loss since both cpu_switch and
cpu_switchto are always called at splsched, so the process' spl is
effectively saved somewhere in the callstack.
This fixes an evbarm problem reported by Allen Briggs:
lwp gets into sa_switch -> mi_switch with newl != NULL
when it's the last element on the runqueue, so it
hits the second bit of:
if (newl == NULL) {
retval = cpu_switch(l, NULL);
} else {
remrunqueue(newl);
cpu_switchto(l, newl);
retval = 0;
}
mi_switch calls remrunqueue() and cpu_switchto()
cpu_switchto unlocks the sched lock
cpu_switchto drops CPU priority
softclock is received
schedcpu is called from softclock
schedcpu hits the first if () {} block here:
if (l->l_priority >= PUSER) {
if (l->l_stat == LSRUN &&
(l->l_flag & L_INMEM) &&
(l->l_priority / PPQ) != (l->l_usrpri / PPQ)) {
remrunqueue(l);
l->l_priority = l->l_usrpri;
setrunqueue(l);
} else
l->l_priority = l->l_usrpri;
}
Since mi_switch has already run remrunqueue, the LWP has been
removed, but it's not been put back on any queue, so the
remrunqueue panics.