"/usr/include/machine/varargs.h") by a stub include file which will
emit an error if GCC 3.3 or newer is used and include "machine/varargs.h"
otherwise.
Based on a suggestion by Richard Earnshaw. This fixes PR toolchain/22888
by myself.
first three bytes to determine how much of the BPB to preserve.
Supported values:
eb 3c 90 FAT16 BPB
eb 58 90 FAT32 BPB
(anything else) don't preserve any BPB
This is because the BPB is generally only the FAT16 one except in the
bootxx_msdos case, where it's the larger FAT32 one.
(Note: memcmp/memset improvements also benefit non-Xscale).
memcmp() - Compare 32-bits at a time if possible. Special-case 6-byte
comparisons, for the benefit of the network stack.
memset() - More loop unrolling, plus use of 'strd' instruction,
bzero() results in > 100% speedup on Xscale.
memcpy() - Big-endian support, unrolled loops, 'strd/pld', plus special-
cases for very common length/alignment combinations.
Benchmarks show ~50% improvment on Xscale.
memmove() - Big-endian support. Use fast memcpy(), above, if the regions
bcopy() don't overlap. Otherwise unchanged
XXX: The Xscale optimisations are not enabled by default, unless /etc/mk.conf
XXX: has the right compiler options. The intention is to pull them in via
XXX: something like libxscale.so, selected at runtime by ld.so.conf.
XXX: (Big-endian support is not affected by this).
Before:
$ dc ..
miyu% dc ..
dc: 02 unimplemented
dc: 0 unimplemented
dc: 0 unimplemented
dc: input base must be a number between 2 and 16 (inclusive)
dc: stack empty
dc: stack empty
dc: 'h' (0150) unimplemented
dc: stack empty
dc: 'u' (0165) unimplemented
...
** get heart attack suspecting major FS corruption **
After:
$ dc ..
Cannot use directory as input!
- Use the "clz" instruction to pick a run-queue, instead of using the
ffs-by-table-lookup method.
- Use strd instead of stmia where possible.
- Use multiple ldr instructions instead of ldmia where possible.
they use the mini D$.
This results in a small performance boost on xscale platforms, since
flushing the main cache on a context switch won't affect the kernel
stack/pcb.
with the KVA of the newly-wired uarea.
This is useful on some architectures (e.g. xscale) where the uarea mapping
can be tweaked to use the mini-data cache instead of the main cache.
(Note: memcmp/memset improvements also benefit non-Xscale).
memcmp() - Compare 32-bits at a time if possible. Special-case 6-byte
comparisons, for the benefit of the network stack.
memset() - More loop unrolling, plus use of 'strd' instruction,
results in > 100% speedup on Xscale.
memcpy() - Big-endian support, unrolled loops, 'strd/ldrd/pld', plus
special-cases for very common length/alignment combinations
(at least in the kernel). Benchmarks show ~50% improvment on
Xscale.
memmove() - Big-endian support. Use fast memcpy(), above, if the regions
don't overlap. Otherwise unchanged.
userspace). Having them here is both good and bad. Good because they're
close to the actual native ones, bad because it exposes things out
of compat/netbsd32. However, putting the exclusively in the latter
requires a lot of reshuffling in the includes there, so this will
do for now.