(Note: memcmp/memset improvements also benefit non-Xscale).
memcmp() - Compare 32-bits at a time if possible. Special-case 6-byte
comparisons, for the benefit of the network stack.
memset() - More loop unrolling, plus use of 'strd' instruction,
results in > 100% speedup on Xscale.
memcpy() - Big-endian support, unrolled loops, 'strd/ldrd/pld', plus
special-cases for very common length/alignment combinations
(at least in the kernel). Benchmarks show ~50% improvment on
Xscale.
memmove() - Big-endian support. Use fast memcpy(), above, if the regions
don't overlap. Otherwise unchanged.
- put a 'standards conforming' memcmp into memcmp.c
- make bcmp be a second label on the same code
- make bcmp.c be just #include "memcmp.c"
This means that libsa.o might contain both a memcmp.o and a bcmp.o, but
both contain the same code (defining both symbols) so it doesn't matter
which ld uses.
Saves worrying about which of bcmp.c and memcmp.c the architecture specific
Makefile requests.
So make bcmp.c define bcmp and memcmp.
This should (?) fix the atari build.
(I've now no idea why the previous change defined memcpy for the alpha build.)
- use file buffer for all block reads
- only save a small amount of the indirect block list
Allows i386 bootxx_ufs code to load /boot from a filesystem with 32k blocks
while still fitting inside 64k of memory.
Code size reduced as well (by ~1k on i386).
It ought to be possible to use a buffer that is smaller than a filesystem
block. This might be needed in order to boot from filesystems with larger
block sizes.
totally different version that was here. This version, of course, has an
BSD license on it while the old one did not. This one also compiles down
to tighter code--the smaller the better for libkern & libsa.
overridden if MACHINE_LOADFILE_MACHDEP is defined.
This makes life much simpler in the face of the myriad of
different boot options for the evb* ports.