We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant that
any helper with more than 4 arguments would clobber the saved regs.
Realizing that we're supposed to have this memory pre-allocated means
we can clean up the tcg_out_arg functions, which were trying to do
more stack allocation.
Allocate stack memory for the TCG temporaries while we're at it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
On 32-bit TCG targets, when emulating deposit_i64 with a mov_i32 +
deposit_i32, care should be taken to not overwrite the low part of
the second argument before the deposit when it is the same the
destination.
This fixes the shld instruction in qemu-system-x86_64, which in turns
fixes booting "system rescue CD version 2.8.0" on this target.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG optimizer does great work when inserting constants, being able
to fold the open-coded deposit expansion to just an AND or an OR. Avoid
a bit the regression caused by having the deposit opcode by expanding
deposit of zero as an AND.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Glibc 2.16 includes an easy way to get feature bits previously
buried in /proc or the program startup auxiliary vector. Use it.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are a few simple special cases that should be handled first.
Break these out to subroutines to avoid code duplication.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It takes half the cycles to read one CR register instead of all 8.
This is a backward compatible addition to the ISA, so chips prior
to Power 2.00 spec will simply continue to read the entire CR register.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Nothing else in the call chain ensures that these
constants don't have garbage in the high bits.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The optimization/bug being fixed is that tcg_out_cmp was not applying the
right type to loading a constant, in the case it can't be implemented
directly. Rather than recomputing the TCGType enum from the arch64 bool,
pass around the original TCGType throughout.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The mul_i32 pattern was loading non-16-bit constants into a register,
when we can get the middle-end to do that for us. The mul_i64 pattern
was not considering that MULLI takes 64-bit inputs.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we have special code to handle and/or/xor with a constant,
apply the same to andc/orc/eqv with a constant.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Using a table to look up insns of the right width and sign.
Include support for the Power 2.06 LDBRX and STDBRX insns.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Handle constants in common code; we'll want to reuse that later.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Improve constant addition -- previously we'd emit useless addi with 0.
Use new constraints to force the driver to pull full 64-bit constants
into a register.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We'll need a zero, and Z makes more sense for that. Make sure we
have a full compliment of signed and unsigned 16 and 32-bit tests.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The test for using movi32 was sub-optimal for TCG_TYPE_I32, comparing
a signed 32-bit quantity against an unsigned 32-bit quantity.
When possible, use addi+oris for 32-bit unsigned constants. Otherwise,
standardize on addi+oris+ori instead of addis+ori+rldicl.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We weren't ignoring the high 32 bits during a NE comparison.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
* 'tci' of git://qemu.weilnetz.de/qemu:
tci: Make tcg temporaries local to tcg_qemu_tb_exec
tci: Delete unused tb_ret_addr
tci: Avoid code before declarations
tci: Use a local variable for env
tci: Use 32-bit signed offsets to loads/stores
We're moving away from the temporaries stored in env. Make sure we can
differentiate between temp stores and possibly bogus stores for extra
call arguments. Move TCG_AREG0 and TCG_REG_CALL_STACK out of the way
of the parameter passing registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off by: Stefan Weil <sw@weilnetz.de>
Since the change to tcg_exit_req, the first insn of every TB is
a load with a negative offset from env.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off by: Stefan Weil <sw@weilnetz.de>
When the TCG condition codes were re-organized last year,
we failed to update all of the "old-style" tests for unsigned.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This can save one insn, if the constant has any bits in 32-63 set,
but no bits in 21-31 set. It never results in more insns.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we're always in 64-bit mode, load address performs a full
64-bit add. Use that for 3-address addition, as well as for
larger constant addends when we lack extended-immediates facility.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we have a free temporary and can always just load the constant, we
ought to do so, rather than spending the same effort constraining the const.
Signed-off-by: Richard Henderson <rth@twiddle.net>
We only support 64-bit code generation for s390x.
Don't clutter the code with ifdefs that suggest otherwise.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Set TCG_TARGET_CALL_STACK_OFFSET properly for the abi. Allocate the
standard TCG_STATIC_CALL_ARGS_SIZE. And while we're at it, allocate
space for CPU_TEMP_BUF_NLONGS.
Signed-off-by: Richard Henderson <rth@twiddle.net>