The earlier implementation fell into a corner case for bytes that were
0x01, giving a wrong result (but not affecting our application test
cases for strings, as an ASCII value 0x01 is rare in those...).
This changes the algorithm to:
1. Mask out the high-bit of each bytes (so that each byte is <= 127).
2. Add 127 to each byte (i.e. if the low 7 bits are not 0, this will overflow
into the highest bit of each byte).
3. Bitwise-or the original value back in (to cover those cases where the
source byte was exactly 128) to saturate the high-bit.
4. Shift-and-mask (implemented as a mask-and-shift) to extract the MSB of
each byte into its LSB.
5. Multiply with 0xff to fan out the LSB to all bits of each byte.
Fixes: d7a4fcb034 ("target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci")
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Reported-by: Vincent Palatin <vpalatin@rivosinc.com>
Tested-by: Vincent Palatin <vpalatin@rivosinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20211013184125.2010897-1-philipp.tomsich@vrull.eu
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>