(Note: memcmp/memset improvements also benefit non-Xscale).
memcmp() - Compare 32-bits at a time if possible. Special-case 6-byte
comparisons, for the benefit of the network stack.
memset() - More loop unrolling, plus use of 'strd' instruction,
results in > 100% speedup on Xscale.
memcpy() - Big-endian support, unrolled loops, 'strd/ldrd/pld', plus
special-cases for very common length/alignment combinations
(at least in the kernel). Benchmarks show ~50% improvment on
Xscale.
memmove() - Big-endian support. Use fast memcpy(), above, if the regions
don't overlap. Otherwise unchanged.
bcopy.S is no longer needed
memmove and memcpy were both stacking r0 and unstacking it to keep the return value, so push this down into _memcpy.
rename _memcpy.S to memcpy.S.
memmove.S is now just a placeholder otherwise the make system automagically adds a memmove.c file to libkern.
memmove is just another entry point for memcpy.
them.
These are identical to the current arm32 sources with the following exceptions:
- References to C labels are wrapped in _C_LABEL().
- Function returns have an alternate version inside #ifdef __APCS_26__.