d7c41a60d0
The shift instructions are rewritten instead of reusing code from the old decoder. Rotates use CC_OP_ADCOX more extensively and generally rely more on the optimizer, so that the code generators are shared between the immediate-count and variable-count cases. In particular, this makes gen_RCL and gen_RCR pretty efficient for the count == 1 case, which becomes (apart from a few extra movs) something like: (compute_cc_all if needed) // save old value for OF calculation mov cc_src2, T0 // the bulk of RCL is just this! deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1 // compute carry shr cc_dst, cc_src2, length - 1 and cc_dst, cc_dst, 1 // compute overflow xor cc_src2, cc_src2, T0 extract cc_src2, cc_src2, length - 1, 1 32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
---|---|---|
.. | ||
alpha | ||
arm | ||
avr | ||
cris | ||
hexagon | ||
hppa | ||
i386 | ||
loongarch | ||
m68k | ||
microblaze | ||
mips | ||
openrisc | ||
ppc | ||
riscv | ||
rx | ||
s390x | ||
sh4 | ||
sparc | ||
tricore | ||
xtensa | ||
Kconfig | ||
meson.build |