Copy propagation introduced in 22613af4a6
considered only global registers. However, register temps and stack
allocated locals must be handled differently because register temps
don't survive across brcond.
Fix by propagating only within same class of temps.
Tested-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Fix breakage by a640f03178
and 55c0975c5b.
Some TCG targets don't implement all TCG ops, so make
optimizing those conditional.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Perform constant folding for NOT and EXT{8,16,32}{S,U} operations.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Perform actual constant folding for ADD, SUB and MUL operations.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
cppcheck reports an error:
qemu/tcg/mips/tcg-target.c:1487: error: Invalid number of character (()
The unpatched code won't compile on mips hosts starting with commit
cea5f9a28f.
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Expand the note on the number of TCG ops generated per target insn,
to be clearer about the range of applicability of the 20 op rule
of thumb. Also add a note about the hard MAX_OP_PER_INSTR limit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use stack instead of temp_buf array in CPUState for TCG temps.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use TCG_REG_CALL_STACK instead of TCG_REG_SP for consistency.
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use stack instead of temp_buf array in CPUState for TCG temps.
On Sparc64, stack pointer is not aligned but there is a fixed bias of 2047,
so don't try to enforce alignment.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Except for specific cases where the use of %esp changes the encoding of
the instruction, it's cleaner to use TCG_REG_CALL_STACK instead of
TCG_REG_ESP.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The code for stack allocation for call arguments is way too simplistic
to actually work on targets with non-trivial stack allocation policies,
e.g. ppc64. We've also already allocated TCG_STATIC_CALL_ARGS_SIZE worth
of stack for calls which should be well more than any helper needs.
Remove broken dynamic stack allocation code and replace it with an assert.
Should dynamic stack allocation ever be needed again, target specific
functions should be added.
Thanks to Richard Henderson for the analysis.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make functions take a parameter for CPUState instead of relying
on global env. Pass CPUState pointer to TCG prologue, which moves
it to AREG0.
Thanks to Peter Maydell and Laurent Desnogues for the ARM prologue
change.
Revert the hacks to avoid AREG0 use on Sparc hosts.
Move cpu_has_work() and cpu_pc_from_tb() from exec.h to cpu.h.
Compile the file without HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Based on a patch from Hans de Goede <hdegoede@redhat.com>
This warning is new in gcc 4.6.
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When compiling with DEBUG_TCGV enabled, make the TCGv_ptr type distinct
from TCGv_i32/TCGv_i64. This means that using an i32 or i64 TCG op to
manipulate a TCGv_ptr will always be detected at compile time, rather
than only if compiling on a host system with the other word size.
NB: the tcg_add_ptr and tcg_sub_ptr macros have been removed as they
were not used anywhere.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The prototypes for the ld/st functions on a 64 bit host declared
the address parameter as a TCGv_i64 rather than a TCGv_ptr. This
worked OK (since the two are aliases), but needs to be fixed to
allow extension of TCG type debugging to i64/i32/ptr mismatches.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use the correct header in the TCG MIPS code to find cacheflush() on OpenBSD
to fix compilation of the MIPS host support for OpenBSD/mips64 based architecures.
Signed-off-by: Brad Smith <brad@comstyle.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If an op with dead outputs is not removed, because it has side effects
or has multiple output and only one dead, mark the registers as dead
instead of saving them. This avoid a few register spills on TCG targets
with low register count, especially with div2 and mul2 ops, or when a
qemu_ld* result is not used (prefetch emulation for example).
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If an op is not removed and has dead output arguments, mark it
in op_dead_args similarly to what is done for input arguments.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Allow all args to be dead by replacing the input specific op_dead_iargs
variable by op_dead_args. Note this is a purely mechanical change.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Although the TCG generated code is always in ARM mode, it is possible
that the host code was compiled by gcc in Thumb mode (this is often the
default for Linux distributions targeting ARM v7 only). Handle this
by using BLX imm when doing a call from ARM into Thumb mode.
Since BLX imm is not a conditionalisable instruction, we make
tcg_out_call() no longer take a condition code; we were only ever
using it with COND_AL anyway.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
Add support (if CONFIG_DEBUG_TCG is defined) for debugging leakage
of temporary variables. Generally any temporaries created by
a target while it is translating an instruction should be freed
by the end of that instruction; otherwise carefully crafted
guest code could cause TCG to run out of temporaries and assert.
By calling tcg_check_temp_count() after each instruction we can
check that we are not leaking temporaries in this way.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add a comment about cache coherency and retranslation, so that people
developping new targets based on existing ones are warned of the issue.
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Improve constant loading in two ways:
- On all ARM versions, it's possible to load 0xffffff00 = -0x100 using
the mvn rd, #0. Fix the conditions.
- On <= ARMv6 versions, where movw and movt are not available, load the
constants using mov and orr with rotations depending on the constant
to load. This is very useful for example to load constants where the
low byte is 0. This reduce the generated code size by about 7%.
Also fix the coding style at the same time.
Cc: Andrzej Zaborowski <balrog@zabor.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
TCG on MIPS was trying to avoid changing the branch offset, but didn't
due to a stupid typo. Fix it.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Due to a typo, qemu_st64 doesn't properly byteswap the 32-bit low word of
a 64 bit word before saving it. This patch fixes that.
Acked-by: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
QEMU uses code retranslation to restore the CPU state when an exception
happens. For it to work the retranslation must not modify the generated
code. This is what is currently implemented in ARM TCG.
However on CPU that don't have icache/dcache/memory synchronised like
ARM, this requirement is stronger and code retranslation must not modify
the generated code "atomically", as the cache line might be flushed
at any moment (interrupt, exception, task switching), even if not
triggered by QEMU. The probability for this to happen is very low, and
depends on cache size and associativiy, machine load, interrupts, so the
symptoms are might happen randomly.
This requirement is currently not followed in tcg/arm, for the
load/store code, which basically has the following structure:
1) tlb access code is written
2) conditional fast path code is written
3) branch is written with a temporary target
4) slow path code is written
5) branch target is updated
The cache lines corresponding to the retranslated code is not flushed
after code retranslation as the generated code is supposed to be the
same. However if the cache line corresponding to the branch instruction
is flushed between step 3 and 5, and is not flushed again before the
code is executed again, the branch target is wrong. In the guest, the
symptoms are MMU page fault at a random addresses, which leads to
kernel page fault or segmentation faults.
The patch fixes this issue by avoiding writing the branch target until
it is known, that is by writing only the branch instruction first, and
later only the offset.
This fixes booting linux guests on ARM hosts (tested: arm, i386, mips,
mipsel, sh4, sparc).
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The usermode version of qemu_ld doesn't used mem_index,
leading to set-but-not-used warnings.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar@axis.com>
A typo in the usermode address calculation path; R3 used where R2 needed.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar@axis.com>
Use ld4 not ld8 for reading the tlb of 32-bit targets.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar@axis.com>
The port was not properly merged following
86feb1c860
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar@axis.com>