a220774a13
gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32; hand-writing it made it about 12x faster, by avoiding a zillion loads and stores to spill everything and the kitchen sink onto the stack. (But gcc does fine on aarch64, presumably because it has twice as many registers and doesn't have to deal with q2=d4/d5 overlapping.) |
||
---|---|---|
.. | ||
adiantum | ||
aes | ||
blowfish | ||
camellia | ||
cast128 | ||
cprng_fast | ||
des | ||
nist_hash_drbg | ||
rijndael | ||
skipjack |