(Note: memcmp/memset improvements also benefit non-Xscale).
memcmp() - Compare 32-bits at a time if possible. Special-case 6-byte
comparisons, for the benefit of the network stack.
memset() - More loop unrolling, plus use of 'strd' instruction,
bzero() results in > 100% speedup on Xscale.
memcpy() - Big-endian support, unrolled loops, 'strd/pld', plus special-
cases for very common length/alignment combinations.
Benchmarks show ~50% improvment on Xscale.
memmove() - Big-endian support. Use fast memcpy(), above, if the regions
bcopy() don't overlap. Otherwise unchanged
XXX: The Xscale optimisations are not enabled by default, unless /etc/mk.conf
XXX: has the right compiler options. The intention is to pull them in via
XXX: something like libxscale.so, selected at runtime by ld.so.conf.
XXX: (Big-endian support is not affected by this).
. this function cannot have _PROF_PROLOGUE
. this function cannot be called via PLT
Change ENTRY to NENTRY for the former.
XXX: The latter is important in the gcc3 world, that have shared
libraries. We will need to play tricks with .hidden to make sure
every shared library gets its own private __udivsi3 that it can call
directly, without going through the PLT.
* DPSRCS contains extra dependencies, but is _NOT_ added to CLEANFILES.
This is a change of behaviour. If a Makefile wants the clean semantics
it must specifically append to CLEANFILES.
Resolves PR toolchain/5204.
* To recap: .d (depend) files are generated for all files in SRCS and DPSRCS
that have a suffix of: .c .m .s .S .C .cc .cpp .cxx
* If YHEADER is set, automatically add the .y->.h to DPSRCS & CLEANFILES
* Ensure that ${OBJS} ${POBJS} ${LOBJS} ${SOBJS} *.d depend upon ${DPSRCS}
* Deprecate the (short lived) DEPENDSRCS
Update the various Makefiles to these new semantics; generally either
adding to CLEANFILES (because DPSRCS doesn't do that anymore), or replacing
specific .o dependencies with DPSRCS entries.
Tested with "make -j 8 distribution" and "make distribution".