faster (on the i486 & i586) rorw $8, %w1. The inline assembly for GCC 1.X already used rorw. Using rorw is one byte longer, but we wouldn't be inlining at all if we weren't optimizing for speed.