* Replace __byte_swap_32_variable() with a C version from Richard
Earnshaw that generates nearly identical assembly (and it would be
exactly identical with the addition of another peephole to GCC ARM
back-end).
* Byte-swap 16-bit and 32-bit constants at compile-time.
* Inline 16-bit and 32-bit variable byte-swaps. These take 3 and 4
insns, respectively, and inlining saves the minimum 6 cycle penalty
to call/return from the byte swap function.