Complete support for FPU_VFP.
fpregs now contains vfpreg.
XXX vfpreg only has space for 16 64-bit FP registers though VFPv3 and later
have 32 64-bit FP registers.
since TPIDRPRW is the cp15 register name.
Initialize it early in start along with CI_ARM_CPUID.
Remove other initializations.
We alays have ci_curlwp.
Enable TIPRPRW_IS_CURCPU in std.beagle.
[tested on a beaglboard (cortex-a8)]
Add emulation of instruction which save/restore the VFP FPSCR.
Add a sysarch hook to VFP FPSCR manipulation.
[The emulation will be used by libc to store/fetch exception modes and
rounding mode on a per-thread basis.]
32-bits. But the newer ARM ABI AAPCS changes the alignment of 64-bit
fields so structures need to copied in and out to deal with the alignment
change. This is a kludge but makes debugging of AAPCS support much easier.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
CPU_CORTEXA8 everywhere since there more types of Cortex than just the A8.
CPU_CORTEXA8 still exists but causes CPU_CORTEX to be defined.
Add CPU_CORTEXA9 as well. Use .arch armv7a to get us the isb/dsb
instructions.
Test booted to root device prompt on a Beagleboard.
All ARM kernels successfully test built.
into modules. By and large this commit:
- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
In fact it's mostly the same code, with a different stub on it.
On a cats the regress/sys/net/in_cksum tests show that it takes around
50-60% of the time the C version takes. In some cases it takes as low as
20%.
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.
This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.
TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.
NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
Because pre-v6 ARM lacks support for an atomic compare-and-swap, we
implement _lock_cas() as a restartable atomic squence that is checked
in the IRQ handler right before AST processing. (This is safe because,
for all practical purposes, there are no SMP pre-v6 ARM systems.)
This can serve as a model for other non-MP platforms that lack the
necessary atomic operations for mutexes (SuperH, for example).
Upshots of this change:
- kmutex_t is now down to 8 bytes on ARM; about as good as we can get.
- ARM2 systems don't have to trap and emulate SWP or SWPB for mutexes.
The acorn26 port is not updated by this commit to do the LOCK_CAS_CHECK.
That is left as an exercise for the port maintainer.
Reviewed and tested by Matt Thomas.
kernel config option CPU_XSCALE_PXA2X0 is now obsoleted by
CPU_XSCALE_PXA250 and CPU_XSCALE_PXA270. If both of them are defined,
CPU is determined run-time.
This code takes no advantage of any 'new' features provided by
architecture 6 devices (such as physically tagged caches or new
MMU features), and basically runs the chip in a 'legacy v5' mode.
alignment fault checking if necessary.
This option gets the acorn32 port working again.
XXX: Richard Earnshaw suggested enabling alignment faults for
XXX: userland only on acorn32. Need to investigate this.
Some features of the new pmap are:
- It allows L1 descriptor tables to be shared efficiently between
multiple processes. A typical "maxusers 32" kernel, where NPROC is set
to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
with just 4 L1s. This completely solves the problem of running out
of contiguous physical memory for allocating new L1s at runtime on a
busy system.
- Much improved cache/TLB management "smarts". This change ripples
out to encompass the low-level context switch code, which is also
much smarter about when to flush the cache/TLB, and when not to.
- Faster allocation of L2 page tables and associated metadata thanks,
in part, to the pool_cache enhancements recently contributed to
NetBSD by Wasabi Systems.
- Faster VM space teardown due to accurate referenced tracking of L2
page tables.
- Better/faster cache-alias tracking.
The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.