This probably costs 1 clock (on modern cpus) in the normal case.
But gives a big benefit when the destination is misaligned.
In particular when the source has the same misalignment - although
that may not be a gain on Nehalem!
Fixes PR/35535
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).
tested by booting a kernel in qemu and compile-testing i386/ALL
Since 'rep stos' will have a long setup time, avoid doing it more than once.
For misaligned (start address or length) write an unaligned word at both
ends of the buffer then aligned 'rep stosd' the middle.
Use the same code for bzero().
bzero.S is left being compiled for a while (empty) - to avoid issues with
duplicate symbols in libc.a after update builds.
than test) so that the condition code is set correctly (and fix the
comments: 0x10->0x01 and ^->&). From Anon Ymous
XXX: There are similar comment errors in some of the other string code.
XXX: We really need a regression test that includes misaligned memory
with searches designed to catch corner cases such as searching for 0,
-1, etc, and search length limit violations. Searching for 0 on
misaligned memory would have caught this problem.
Always read aligned words, invalidating unwanted bytes in first word,
and checking that any match in the last word is before the buffer end.
No loops apart from the one through the data.
1) doesn't do byte compares to find which byte matched
2) doesn't do byte compares if any top bits are set
3) doesn't use a loop when the input is misaligned
4) has less mispredicted branches
Passes regression tests and 'build.sh' doesn't explode (and more than usual).
Then use bit scan to work out which byte is zero.
If the source is misaligned read the aligned word and make the unwanted
(low order) bytes non-zero.
Passes regression test - which probably tests just enough cases.
as tsutsui@ suggested, and include <sys/param.h> in sha2.c instead.
On the vax, this causes <machine/macros.h> to be included, and it contains
that machine's memset() macro+inline.